Add packages.json manifest generator (draft for #9966) (#9967)
* Add packages.json manifest generator (draft for #9966)
Adds scripts/build_index.lua + a CI workflow that runs it on every push
to master and publishes dist/packages.json to the gh-pages branch.
The manifest is a flat, declarative view of the repository (name, version,
description, license, homepage, repository_url, download_url per package),
generated by reusing xmake's own package loader so no xmake.lua source
parsing is involved. $(version) placeholders in URLs are resolved against
the latest version, so all URLs in the output are concrete.
Intended consumer: repology.org indexing, which is currently blocked on
exactly this kind of structured source (see repology/repology-updater#1585
and the discussion in #9966). The schema is open for review — repository_url
currently mirrors homepage, which works for the common case of a github.com
URL but may need refinement.
Sanity check on dev: generated dist/packages.json successfully for 1952
packages, 0 unresolved $(version) placeholders. Coverage gaps (~3% missing
homepage, ~3% missing version, ~3% missing download_url) are real authoring
gaps in individual packages, not bugs in the generator.
* build_index: use semver-aware version sort
String-sort on version keys ranks "1.10.0" before "1.2.0", so the "latest
version" picked for ~122 packages was wrong (e.g. aws-c-common reported
v0.9.3 instead of v0.12.6; aom 3.9.1 instead of 3.13.1).
Switch to a semver-aware ascending sort using core.base.semver, matching
the pattern already used in scripts/build_artifacts.lua. Falls back to
string comparison for the small number of packages whose version strings
are not valid semver (git refs, etc.).
* build_index: derive repository_url from add_urls
Mirror waruqi's suggestion in xmake-io/xmake-repo#9967 — instead of
mirroring homepage (which is often a project site like https://abseil.io),
walk add_urls and normalize the first forge-shaped entry to its bare
<scheme>://<host>/<owner>/<repo> root.
- /-/archive/ and trailing .git are unambiguous markers, accepted on any
host (works for self-hosted GitLab and code.videolan.org).
- /archive/, /releases/, /get/ are only stripped on known forge hosts to
avoid false positives on tarball mirrors that happen to use those words
as directory names (e.g. download.imagemagick.org/.../releases/).
- Fall back to homepage only when the homepage itself is already a
forge <owner>/<repo> URL.
Result on current dev tip: 1689/1952 packages now have a repository_url
(86.5%), of which 754 differ from homepage and now point at the actual
repo (abseil → github.com/abseil/abseil-cpp, x264 → code.videolan.org/...,
etc.). No emitted URL retains a .git suffix or an /archive/ path segment.
The remaining 263 are packages whose add_urls points at a non-forge
tarball mirror — fixing those requires either declaring an explicit
repository field per package or adding a metadata-only field upstream.
* build_index: rename output to repology_packages_index.json
Per maintainer review on #9967 — since this index is Repology-specific,
name the file accordingly. Publish target (gh-pages) is unchanged.
* build_index: consolidate index workflow into sync.yml
Per maintainer feedback on #9967 — drop the separate build-index.yml
and append the build+publish step to sync.yml so a single CI cadence
(hourly cron) covers both dev->master sync and the repology index
update.
Adds a sanity-check on the manifest count (>1000) so a broken
generator run never publishes an empty or truncated index. Publishes
via SSH (key already installed in sync.yml above), mirroring the
deploy.yml pattern.