The Docker registry protocol has very limited search and index functionality - you can (if enabled on the server) list all repositories, list tags for each repository, and then download the manifest for each tag. This obviously is very inefficient if you want to ask questions like:
* "What containers are available with the org.flatpak.metadata" and what are their human readable names? * "Is there a newer version available of any of these 50 installed applications" (especially without fingerprinting your machine by asking about each application)
We're planning to distribute desktop applications as Flatpaks on registry.fedoraproject.org, and to be able make GNOME Software and updates work properly we need a way to efficiently query the metadata of the registry.
To handle this, I wrote a new server "Flagstate" - https://github.com/owtaylor/flagstate - It sits alongside a docker/distribution registry instance. At initialization, it retrieves all the metadata from the registry server and stores it in a database. It then incrementally updates the database in response to webhook notifications from the registry.
The index server supports queries. A hypothetical URL would be:
https://registry.fedoraproject/index/static?architecture=amd64&annotatio...
Which returns a JSON dump of the metadata for matching containers. The response is designed to be cacheable with consistent ordering and Etag support. See https://github.com/owtaylor/flagstate/blob/master/docs/protocol.md for details.
In addition to the Flatpak use case, this also would be useful for the Cockpit project that wants to be able to provide a nice interface for browsing Cockpit plugins that are packaged as container images. Additional future use cases that could be accomplished with pretty simple extension of the Flagstate code include providing a backend for CLI searches - docker search, podman search, ..., and providing a more comprehensive web frontend than what is currently on registry.fedoraproject.org - allow seeing names, descriptions, and so forth extracted from container labels.
I'd like to propose that we work toward a deployment of this on Fedora infrastructure.
Mini-FAQ =======
Can't this be done by just traversing the registry in a cron job and writing a static index? Without the incremental update process, it's going to be impossible to provide a reasonable update frequency for a medium-large index. It would be possible to static-generate a fixed set of queries instead of having dynamic responses, but I think the current approach is more flexible - it avoids having to hard-code, say, Flatpak specifics in the server configuration.
Are there existing projects that could be used instead of writing our own? Not that I'm aware of - 'reg', which is used for the current HTML frontend for registry.fedoraproject.org, takes the above approach of traversing the entire registry and writing a static index.
How is the data stored? Data is stored in a a PostgreSQL database - this is purely a shadow of the data as canonically stored in the registry, so has minimal backup needs. PostgreSQL 9.4 or newer is needed for the jsonb functionality, though this requirement could be removed if necessary.
Why the language/license choice? Flagstate is writing in Go and license under the Apache License 2.0 to match the docker/distribution codebase and maximize the chance of creating a community.
Is it secure? It's hard to say without some careful checking. The index only indexes public information and care is taken when building SQL queries. DOS is always a tricky thing to handle on any unauthenticated API - setting the statement_timeout postgresql parameter might would make accidental DOS harder.
infrastructure@lists.fedoraproject.org