During Flock 2016, I had the opportunity to talk with Adam Miller, Dennis Gilmore, and Pierre-Yves Chibon about the technical challenges with distributing Docker images with Fedora's extensive mirror network. These conversations helped me to solidify a proposal for how Fedora could solve this problem, outlined below.
High level view ===============
In summary, the proposal is to write a patch for the docker client that will give it the capability to accept metalink responses upon docker pull operations. We would also need to add support for Docker images to mirror list and mirror manager. Additionally, we will need a small tool to pull the content to be mirrored out of a docker registry and write them to disk in a format that can be mirrored, as well as some Ansible code to run the tool when there is new content to be mirrored.
Background ==========
The Fedora project wishes to begin distributing new types of content than it has in the past. One of the types that has been identified as a goal is the Docker image. Adam Miller has already done the work that will allow packagers to build Docker images, but we still need a way to distribute those builds to Fedora's users. Adam Miller's implementation helpfully drops the builds we want into a Docker registry.
Proposed Changes ================
Mirror List -----------
Users will be pointing their docker clients at Mirror List when they docker pull Fedora's Docker images. In order for this to work, we will need to make two changes to Mirror List so that it can respond to the docker client properly. The first change is that Mirror List will need to respond with a special header and a body of "{}" when the docker client sends a GET request for /v2/. The second change is that it will need to return a metalink document when the client makes additional requests so that the clients can be redirected to a list of mirrors near their locations, just as it does with the dnf client today.
The docker client typically connects to port 5000. We could run a second instance of Mirror List on port 5000 if we wanted to isolate it from the current instance. We can also have the docker client pull from 443 as dnf does if we want to keep the deployment simpler.
Mirror Manager --------------
We will need to make a few changes to Mirror Manager as well. We will need to provide an interface to allow mirror admins to opt in/out of mirroring Docker content. We will also need to modify the curler to detect whether a given mirror is up to date or not. We will need to make sure that UMDL is updated when content changes.
docker ------
The most significant work required will likely be modifying the docker client to enable it to properly handle the metalink responses it will be receiving from Mirror List. When requesting the manifest, it will receive a metalink document that will give it a priority ordered list of mirrors. It will need to work through the list in order until it reaches a mirror that has the correct checksum for the requested manifest. It will then use that same mirror for the subsequent blob requests.
There is some concern that such a feature would not be accepted by the upstream docker project. If we were to proceed with this proposal, we would propose this patch to the upstream Docker project. If upstream were not willing to accept the feature, we would need to have the Fedora docker packager carry this patch as a downstream add on.
New Tool --------
The last piece that is needed is a tool that can create the filesystem tree that we want to synchronize out to the mirrors. The mirrors only need to carry manifests and blobs, so the tool needs only to pull these documents out of the registry that Adam Miller has set up and write them to disk in a particular structure. For optimization, we could use hardlinks for blobs that are common across the various images (for example, the Fedora base blob will be the same in all images) to save rsync time and mirror disk space.
Additionally, we will need a playbook to run this new tool in response to fedmsgs. We may be able to use Adam Miller's loopabull project to run such a playbook at the right times.
Conclusion ==========
Thanks for reading, and please respond with any comments or questions you have about this proposal. I'm happy to clarify any points further, and if you have any alternative proposals I'd love to hear those as well.
It would also help to have the following information. The mirrors will need to have this information in order to make informed decisions. (I will also have to make changes to quick-fedora-mirror to accommodate.)
1) How much content will the mirrors need to store? How will this amount change over time?
2) Do you have a plan for placing an upper bound on the total amount of data? (In Fedora things are moved to archive, though that has its own problems and of course doesn't really place an upper bound on anything.)
3) How much change do you expect per day? Churn is really important, and even now we can come close to the point where the master mirrors simply can't feed new content to the tier 1 mirrors fast enough for them to keep ahead of the changes we're making.
4) How will this be organized on the master mirrors? It really should be in a separate rsync module, and the archive (if that happens) should also be in a separate rsync module.
For some background, I have three mirrors, each identical (1U, 4x4TB disks, RAID0, cached on SSD using bcache). I mirror all content from the master mirrors. Right now I have about 12TB used, 3.5TB free. I could upgrade to larger disks for more space, but of course that costs money.
Basically we're past the point where we have to carefully consider how we ask the mirror network for more storage and bandwidth, and any plan for adding stuff should at least include some projections of disk and bandwidth usage.
- J<
On Tue, 2016-08-16 at 11:24 -0500, Jason L Tibbitts III wrote:
It would also help to have the following information. The mirrors will need to have this information in order to make informed decisions. (I will also have to make changes to quick-fedora-mirror to accommodate.)
- How much content will the mirrors need to store? How will this
amount change over time?
Hello Jason! I confess that I don't have good answers to your questions, and I'm not sure who would. Many of these questions depend on how popular Docker images become with Fedora packagers.
How many bytes of content we will be creating does depend on how many applications get packaged as Docker images. I would guess the base image to be a few hundred megabytes, but we can probably use some fancy hardlinking to help reduce disk/network usage so that the base image is only stored once. The rest of the storage is going to be the diffs applied as layers on top of the base image that add whatever each individual image needs. The sizes of these layers will vary greatly by application, so this is also difficult to guess.
It's difficult to make informed guesses about this since I don't know how many Docker images the fedora packagers will create (or at what rate they will create them over time).
- Do you have a plan for placing an upper bound on the total amount
of data? (In Fedora things are moved to archive, though that has its own problems and of course doesn't really place an upper bound on anything.)
I don't have such a plan at this time. If anyone has suggestions about this, that would be helpful. It's unclear whether Docker images would live inside or outside of the traditional Fedora cycle (i.e., F24/F25/F26). It may have its own separate cycle, or we may just go with the current Fedora cycle.
- How much change do you expect per day? Churn is really important,
and even now we can come close to the point where the master mirrors simply can't feed new content to the tier 1 mirrors fast enough for them to keep ahead of the changes we're making.
This again depends on how popular the Docker image offering becomes with our packagers, so it is difficult for me to make an educated guess. Popularity is difficult to predict.
- How will this be organized on the master mirrors? It really
should be in a separate rsync module, and the archive (if that happens) should also be in a separate rsync module.
In my proposal e-mail I mentioned that it was important for mirror manager to allow mirror admins to opt-in to hosting Docker content. Since we don't know the answer to so many of these questions, I suggest we opt mirrors out by default, and let admins opt themselves in as they please. Our proposal didn't have an exact path for storing Docker images, but it was planned to be separated from the RPM and ISO content at a fairly high level in the tree.
I apologize for having so few answers. If anyone can shed more light on Jason's questions, please reply.
On Tue, Aug 16, 2016 at 12:46 PM, Randy Barlow bowlofeggs@fedoraproject.org wrote:
On Tue, 2016-08-16 at 11:24 -0500, Jason L Tibbitts III wrote:
It would also help to have the following information. The mirrors will need to have this information in order to make informed decisions. (I will also have to make changes to quick-fedora-mirror to accommodate.)
- How much content will the mirrors need to store? How will this amount change over time?
Hello Jason! I confess that I don't have good answers to your questions, and I'm not sure who would. Many of these questions depend on how popular Docker images become with Fedora packagers.
How many bytes of content we will be creating does depend on how many applications get packaged as Docker images. I would guess the base image to be a few hundred megabytes, but we can probably use some fancy hardlinking to help reduce disk/network usage so that the base image is only stored once. The rest of the storage is going to be the diffs applied as layers on top of the base image that add whatever each individual image needs. The sizes of these layers will vary greatly by application, so this is also difficult to guess.
It's difficult to make informed guesses about this since I don't know how many Docker images the fedora packagers will create (or at what rate they will create them over time).
- Do you have a plan for placing an upper bound on the total amount
of data? (In Fedora things are moved to archive, though that has its own problems and of course doesn't really place an upper bound on anything.)
I don't have such a plan at this time. If anyone has suggestions about this, that would be helpful. It's unclear whether Docker images would live inside or outside of the traditional Fedora cycle (i.e., F24/F25/F26). It may have its own separate cycle, or we may just go with the current Fedora cycle.
I think we can choose a reasonable archival time, sorting out the implementation of that with the tool the does the layer data extraction might be a challenge but I imagine it's one we can collectively sort out if necessary.
- How much change do you expect per day? Churn is really important, and even now we can come close to the point where the master
mirrors simply can't feed new content to the tier 1 mirrors fast enough for them to keep ahead of the changes we're making.
This again depends on how popular the Docker image offering becomes with our packagers, so it is difficult for me to make an educated guess. Popularity is difficult to predict.
The current plan is that we will release Docker Layered Images on a Two-Week cadence, potentially in line with the Atomic Host Two-Week deliverable. This might change in the future but that is the current plan.
- How will this be organized on the master mirrors? It really
should be in a separate rsync module, and the archive (if that happens) should also be in a separate rsync module.
In my proposal e-mail I mentioned that it was important for mirror manager to allow mirror admins to opt-in to hosting Docker content. Since we don't know the answer to so many of these questions, I suggest we opt mirrors out by default, and let admins opt themselves in as they please. Our proposal didn't have an exact path for storing Docker images, but it was planned to be separated from the RPM and ISO content at a fairly high level in the tree.
+1
-AdamM
I apologize for having so few answers. If anyone can shed more light on Jason's questions, please reply. _______________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/infrastructure@lists.fedoraproje...
On Tue, Aug 16, 2016 at 11:24:20AM -0500, Jason L Tibbitts III wrote:
- How will this be organized on the master mirrors? It really should be in a separate rsync module, and the archive (if that happens) should also be in a separate rsync module.
I think this is a good idea, it will also help on making the changes to MirrorManager simpler since adding a new category/product to your mirror would be the equivalent of adding a new rsync module.
Pierre
The last piece that is needed is a tool that can create the filesystem tree that we want to synchronize out to the mirrors. The mirrors only need to carry manifests and blobs, so the tool needs only to pull these documents out of the registry that Adam Miller has set up and write them to disk in a particular structure. For optimization, we could use hardlinks for blobs that are common across the various images (for example, the Fedora base blob will be the same in all images) to save rsync time and mirror disk space.
Additionally, we will need a playbook to run this new tool in response to fedmsgs. We may be able to use Adam Miller's loopabull project to run such a playbook at the right times.
You can also look at the simples fedmsg-consumers we already have that are running ansible playbook: https://github.com/fedora-infra/fedmsg-genacls https://github.com/fedora-infra/fedmsg-fasclient https://github.com/fedora-infra/fedmsg-pkgdb-sync-git
Note that it will require a change in the sudoers file to allow the process to call the playbook (and you cannot specify arguments to the command as that will break the configuration to the sudoers file).
Pierre
On Tue, Aug 16, 2016 at 3:33 PM, Randy Barlow bowlofeggs@fedoraproject.org wrote:
The docker client typically connects to port 5000. We could run a second instance of Mirror List on port 5000 if we wanted to isolate it from the current instance. We can also have the docker client pull from 443 as dnf does if we want to keep the deployment simpler.
Putting this on port 5000 wouldn't be a big issue I'd say.
Mirror Manager
We will need to make a few changes to Mirror Manager as well. We will need to provide an interface to allow mirror admins to opt in/out of mirroring Docker content. We will also need to modify the curler to detect whether a given mirror is up to date or not. We will need to make sure that UMDL is updated when content changes.
docker
The most significant work required will likely be modifying the docker client to enable it to properly handle the metalink responses it will be receiving from Mirror List. When requesting the manifest, it will receive a metalink document that will give it a priority ordered list of mirrors. It will need to work through the list in order until it reaches a mirror that has the correct checksum for the requested manifest. It will then use that same mirror for the subsequent blob requests.
So, we get a mirrormanager module (previously called repo) for each different container we ship? Since currently the metalink level is on repo-arch, and that is the level on which it retains and sends the checksums as part of the metalink. If this is the case, I think we have quite a bit more work ahead for mirrormanager to make it able to work with lots of modules.
Also, will we be signing the images, or is the using of metalink the only security people get? Since I can guarantee you that some people will feel like their closest mirror is from their competitor and not want to use it, or that they hit mirrorlist at one point when it's down, and they will stop using the metalink, and just insert $randommirror directly, and use that. I would want to make sure that even if they do this, they still get some sort of verification of the data. This was previously mentioned as one of the main blockers for getting this stuff mirrored out.
There is some concern that such a feature would not be accepted by the upstream docker project. If we were to proceed with this proposal, we would propose this patch to the upstream Docker project. If upstream were not willing to accept the feature, we would need to have the Fedora docker packager carry this patch as a downstream add on.
So that would mean that the Fedora docker images are only available for people that try to run it on a Fedora host or that we manually tell "Yeah, use this mirror url that's outside of our control"? I'm not sure that this will go over smoothly with other people..
Do you have any idea how likely it is to get this patch accepted into upstream? From what I've heard, the Docker people are not really happy about merging distro-specific things, so this is a considerable risk, unless we will just accept that users that are not running a fedora host can't run our images...
On Fri, 2016-08-19 at 12:19 +0000, Patrick Uiterwijk wrote:
On Tue, Aug 16, 2016 at 3:33 PM, Randy Barlow
docker
The most significant work required will likely be modifying the docker client to enable it to properly handle the metalink responses it will be receiving from Mirror List. When requesting the manifest, it will receive a metalink document that will give it a priority ordered list of mirrors. It will need to work through the list in order until it reaches a mirror that has the correct checksum for the requested manifest. It will then use that same mirror for the subsequent blob requests.
So, we get a mirrormanager module (previously called repo) for each different container we ship? Since currently the metalink level is on repo- arch, and that is the level on which it retains and sends the checksums as part of the metalink. If this is the case, I think we have quite a bit more work ahead for mirrormanager to make it able to work with lots of modules.
Hello Patrick, thanks for your reply!
I don't think anyone has decided on this particular detail yet, though you raising it is good because we should probably figure it out soon. I personally wouldn't think we would want to offer a module for each container, as that could get quickly out of hand UI-wise. If Fedora ends up with a container per application, this could easily be hundreds or thousands of modules.
However, that still leaves the question of what *should* a module be? Should all of Docker just be one module, so mirrors can choose all-or- nothing? Or, if Docker containers end up getting based on Fedora releases (like F23/24), could an example module be "F23-docker-images"? Or perhaps we could add in arch like "F24-x86_64-docker-images"?
It has also been suggested that the Docker images could be like Atomic, where there is a release every two weeks and nothing is supported longer than that. If we went that route, the only thing I can think of is to either do all-or-nothing docker, or arch-specific modules.
Does anybody have any thoughts or other ideas on where would be good to draw the lines here?
Also, will we be signing the images, or is the using of metalink the only security people get? Since I can guarantee you that some people will feel like their closest mirror is from their competitor and not want to use it, or that they hit mirrorlist at one point when it's down, and they will stop using the metalink, and just insert $randommirror directly, and use that. I would want to make sure that even if they do this, they still get some sort of verification of the data. This was previously mentioned as one of the main blockers for getting this stuff mirrored out.
You raise an important concern that I also share here. Docker Manifests have a built-in signature feature. To be honest, I have not learned the details about the signature feature, and worse, my personal Manifest familiarity is limited to their v2 schema 1 format[0] which has been obsoleted by a schema 2 format that I have not yet familiarized myself with. So let's say that <hand_waving>yes, we can sign the Docker Manifests</hand_waving> somehow.
The Blobs (Docker calls the Image layers Blobs) themselves are not signed, but they are referenced by checksum in the Manifest. You can see an example Manifest at [0]. Thus, theoretically, if the user trusts the Manifest because it is signed by Fedora, they should be able to trust the Blob layers that they download so long as they do match the expected checksum. The Docker client does seem to check the checksum in my experience.
By the way, the metalink response from Mirror List will include the expected checksum for the manifest that the client is trying to pull. Thus, in addition to Fedora signing the Manifest itself, we can also have the client validate the checksum of the Manifest they receive from the mirror. If we ensure that clients always communicate with Mirror List over TLS, this will add another layer of validation for us.
Due to a technical detail about how the Docker client works, users will not be able to docker pull from a mirror of their choosing with the design I am proposing here. When the docker client attempts to pull, the first thing it does it to perform a GET /v2/. The registry *must* respond with a particular header that indicates that it is a Docker v2 registry, and include {} as the body response. Due to the way our mirrors currently operate, I do not believe we will be able to have the /v2/ path without negotiating for that with our mirrors. I suspect that many mirror admins would not like to give us that path. This is why the plan is to have all Fedora docker clients perform docker pull against Mirror List, so that Mirror List can send the required /v2/ response and then send them the metalink when the user requests the manifest.
There are two technical solutions that can resolve the problem I described in my last paragraph, but they will both require mirror admins to agree to things they may not want to agree to. One is that we could negotiate with mirror admins to allow us to store "docker stuff" at /v2/ on their mirrors. Another possibility is for mirror admins to each run a Docker registry. I think each of these will present a bit of a negotiating challenge for us, which is why I proposed the particular plan I wrote up instead of either of these. However, if anyone wants to explore these options further please feel free.
There is some concern that such a feature would not be accepted by the upstream docker project. If we were to proceed with this proposal, we would propose this patch to the upstream Docker project. If upstream were not willing to accept the feature, we would need to have the Fedora docker packager carry this patch as a downstream add on.
So that would mean that the Fedora docker images are only available for people that try to run it on a Fedora host or that we manually tell "Yeah, use this mirror url that's outside of our control"? I'm not sure that this will go over smoothly with other people..
Unfortunately I believe you are correct, except that users also won't be able to point their docker client at our mirrors. The only way we could enable the unpatched docker client to pull Fedora content would be if we allowed users to access the Docker Registry that the OSBS builds are pushed into (which is where we will be getting the content from as well, to distribute to the mirrors). The unfortunate thing about this proposal is that other than the OSBS registry, there is no other single system that can handle the entirety of the requests coming from a docker pull command.
Do you have any idea how likely it is to get this patch accepted into upstream? From what I've heard, the Docker people are not really happy about merging distro-specific things, so this is a considerable risk, unless we will just accept that users that are not running a fedora host can't run our images...
It is difficult for me to answer this question, but Docker does have a reputation of refusing contributions that help enable competing registries. I would say that there is a strong chance that the patch I am proposing would not be accepted upstream.
If it is important to us that non-Fedora users can docker pull our images, I can think of two options:
0) Negotiate with mirror-admins to allow us to either have /v2/ on the mirror, or to run a full registry instead of just rsyncing Docker content from us.
1) Allow non-Fedora users to docker pull from the same registry that OSBS is putting its content into. This may be difficult to scale without a CDN.
Given these new "holes" that are revealed in my proposal, what does everyone think? Patrick raised some valid concerns here. I'd like to make sure we're going a good direction before going forward with these plans. Please speak up if you find any of the above to be "show stoppers", or if you can think of alternative solutions that I've overlooked.
Thanks everyone for the conversation so far!
On Sun, 2016-08-21 at 14:55 -0400, Randy Barlow wrote:
When the docker client attempts to pull, the first thing it does it to perform a GET /v2/. The registry *must* respond with a particular header that indicates that it is a Docker v2 registry, and include {} as the body response. Due to the way our mirrors currently operate, I do not believe we will be able to have the /v2/ path without negotiating for that with our mirrors. I suspect that many mirror admins would not like to give us that path. This is why the plan is to have all Fedora docker clients perform docker pull against Mirror List, so that Mirror List can send the required /v2/ response and then send them the metalink when the user requests the manifest.
I don't know why it didn't occur to me before, but there is one more option we could consider. We could also write a patch for the docker client to allow a path other than /v2/ to be specified during a Docker pull. Of course, this patch may also be difficult to get accepted upstream, but if we could get it accepted upstream it might make negotiation with mirror admins easier. The docker client will still look for the header that declares the other end to be a v2 registry though, so mirror admins would still need to make sure that header is sent with requests in the Docker path.
Hello! Based on the responses I've received so far, some new information I learned about Docker's Manifest v2, and discussions in #fedora-releng, I would like to propose this second draft of the proposal to mirror Docker images. Thank you for reading, and please do voice your thoughts!
Notable change from the first post ==================================
Instead of metalink responses, we will add support to the Linux docker client for the Docker manifest schema 2's urls feature.
High level view ===============
In summary, the proposal is to work with @runcom[0][1] to write a patch for the docker client that will give it the capability to use the Docker Manifest schema 2 urls feature[2] during docker pull operations. We would also need to add support for Docker images to mirror list and mirror manager. Additionally, we will need a small tool to pull the content to be mirrored out of a docker registry and write them to disk in a format that can be mirrored, as well as some Ansible code to run the tool when there is new content to be mirrored.
Background ==========
The Fedora project wishes to begin distributing new types of content than it has in the past. One of the types that has been identified as a goal is the Docker image. Adam Miller has already done the work that will allow packagers to build Docker images, but we still need a way to distribute those builds to Fedora's users. Adam Miller's implementation helpfully drops the builds we want into a Docker registry.
Proposed Changes ================
Mirror List -----------
Users will be pointing their docker clients at Mirror List when they docker pull Fedora's Docker images. In order for this to work, we will need to make two changes to Mirror List so that it can respond to the docker client properly. The first change is that Mirror List will need to respond with a special header and a body of "{}" when the docker client sends a GET request for /v2/. The second change is that it will need to return a Docker Manifest schema 2 document containing a list of mirrors that have the requested blobs when the client makes additional requests, so that the clients can be retrieve the blobs from a list of mirrors near their locations, similar to how it does with the dnf client today.
The docker client typically connects to port 5000. We could run a second instance of Mirror List on port 5000 if we wanted to isolate it from the current instance. We can also have the docker client pull from 443 as dnf does if we want to keep the deployment simpler.
Mirror Manager --------------
We will need to make a few changes to Mirror Manager as well. We will need to provide an interface to allow mirror admins to opt in/out of mirroring Docker content. We will also need to modify the curler to detect whether a given mirror is up to date or not. We will need to make sure that UMDL is updated when content changes.
There was some discussion about how the Docker content would be organized on the master mirror. We could either give an all-or-nothing Docker module for mirrors to choose from, or we could split the Docker content by arch (or perhaps even primary vs. secondary). I don't have any preference about which we go with. At first it seemed that we couldn't do it by arch since the docker client is presented with a list of manifests by arch (which made it seem that all mirrors would need all arches), but I *think* the client would then make a second request to Mirror List for the specific arch it wanted. If I'm correct, this would mean that this second request would be when Mirror List could pick a list of mirrors that it knew had the requested content. I'm happy to take the time verify my guesses here if this mailing list wants to pursue that option. I'm happy to go with any way of splitting that is desired, but it has been rightly suggested that we not choose to create a module per Docker repository since there could be hundreds or thousands of them.
There was a question about how we would deal with archived data, and I believe that is still an open question. It sounds like we can plan that out later.
docker ------
Patrick Uiterwijk suggested that I look into the new schema 2 manifest that Docker has defined, and when I did so I happened upon a new feature that was not part of schema 1: a list of URLs for each Blob can be listed in the Manifest[2]. We had been thinking that we'd use metalink responses and add support to the docker client, but this feature is built-in to the Docker Manifest.
With great excitement, I thought it would be prudent to do some testing with this feature. Sadly, I came to learn that the feature did not work. I spent an unfortunate amount of time trying to figure out if I had something wrong with my test setup before diving into the code. Once I looked at the code, it became clear that the feature only works for the Windows version of the Docker client! The original pull request[3] was submitted by Microsoft and only worked for the Windows client. Later, Antonio Murdaca submitted a pull request[1] to expand the support for other operating systems, but it was not accepted. In response to all that, he opened an issue[4] to request the feature be expanded.
Despite the difficulty in getting this feature accepted upstream, I think it might be good for us to work with Antonio to try to get this feature implemented and accepted in upstream docker, rather than going with the previous metalink proposal. We may be able to work with the Fedora package maintainer to get Antonio's existing patch carried in our downstream build until it is upstream, if everyone agrees that would be a good mid-term solution.
New Tool --------
The last piece that is needed is a tool that can create the filesystem tree that we want to synchronize out to the mirrors. The mirrors only need to carry manifests and blobs, so the tool needs only to pull these documents out of the registry that Adam Miller has set up and write them to disk in a particular structure. For optimization, we could use hardlinks for blobs that are common across the various images (for example, the Fedora base blob will be the same in all images) to save rsync time and mirror disk space.
Additionally, we will need a playbook to run this new tool in response to fedmsgs. We may be able to use Adam Miller's loopabull project to run such a playbook at the right times.
Signing -------
Patrick raised the question of signing. Docker supported signing within the manifest with the schema 1 version, but with schema 2 the embedded signatures have been removed in favor of the Notary service[5]. This may be an option for Fedora, or there may be alternatives we could look into if desired. I'm happy to dig more if we would prefer not to use Notary for some reason.
The Blobs (Docker calls the Image layers Blobs) themselves are not signed, but they are referenced by checksum in the Manifest. You can see an example Manifest at [6]. Thus, theoretically, if the user trusts the Manifest because it is signed by Fedora, they should be able to trust the Blob layers that they download so long as they do match the expected checksum. The Docker client does seem to check the checksum in my experience.
By the way, the Manifest response from Mirror List will include the expected checksums for the Blobs that the client is trying to pull. Thus, in addition to Fedora signing the Manifest itself, we can also have the client validate the checksums of the Blobs they receive from the mirror. If we ensure that clients always communicate with Mirror List over TLS, this will add another layer of validation for us.
Optional mirror registries --------------------------
A notable drawback to this proposal is that users will not be able to point their docker client directly at a mirror and docker pull. This is due to the docker client not supporting a path to the docker v2 API, and to it expecting to see certain headers in the response. Instead, users will always have to point their clients at mirror list so that it can send them the manifest with URLs to the blobs on the mirrors.
However, we could have a "phase 2" plan, where we ask mirrors to consider running their own full registries for users to pull from. Of course, this would require opt-in and hands on work by the mirror admins (similar to how some mirrors support ftp or rsync, but not all do). Without a registry on the mirror, there isn't a good way that I know of to allow users to docker pull directly from a specific mirror. I'm not sure how we could communicate to users about which mirrors have done this vs. which haven't.
Pros/Cons =========
In comparison to the previous proposal I sent:
Pros: * The needed change in the docker client is more likely to be accepted upstream, which means non-Fedora OS users will still be able to docker pull Fedora images. * The needed change is smaller than would be necessary for the metalink solution. * There is already a working patch available for us to carry in the mid-term, if we wish to do so.
Cons: * Mirror list will need to dynamically serve Manifests so that it can insert the URLs into them, as opposed to serving the metalink documents. In my opinion, this is a minor difference.
Conclusion ==========
Thanks for reading, and please respond with any comments or questions you have about this proposal. I'm happy to clarify any points further, and if you have any alternative proposals I'd love to hear those as well.
[0] https://github.com/docker/distribution/issues/1825 [1] https://github.com/docker/docker/pull/23014 [2] https://docs.docker.com/registry/spec/manifest-v2-2/#/image-manifes t-field-descriptions [3] https://github.com/docker/docker/pull/22866 [4] https://github.com/docker/distribution/issues/1825 [5] https://docs.docker.com/notary/ [6] https://docs.docker.com/registry/spec/manifest-v2-2/#/example-image -manifest
On 24 August 2016 at 18:25, Randy Barlow bowlofeggs@fedoraproject.org wrote:
Hello! Based on the responses I've received so far, some new information I learned about Docker's Manifest v2, and discussions in #fedora-releng, I would like to propose this second draft of the proposal to mirror Docker images. Thank you for reading, and please do voice your thoughts!
OK my major concern with mirroring Docker images are the following:
1) How many images? 2) How large are these images? 3) How often do they change?
Mirrors are a volunteer organization and most of them have only about a 1-2 TB of space they can put up for mirroring. The regular distributions are already hitting that so the larger the amount we use the less they will carry.
Also depending on how large the images are this will impact how long they can get data from us. The images will take time to move out to the mirrors.. the larger the image the longer it takes to mirror. The more churn on the disks the more it affects all the other projects who are being mirrored.
Asking mirrors to run additional software is usually a non-starter unless that software is being used by multiple projects. [We still have mirrors who are still only able to allow FTP out of their sites, and many are run by student sysadmins who are constantly changing so their bosses want anything to basically 'run itself']
On Thu, Aug 25, 2016 at 12:17:06PM -0400, Stephen John Smoogen wrote:
Hello! Based on the responses I've received so far, some new information I learned about Docker's Manifest v2, and discussions in #fedora-releng, I would like to propose this second draft of the proposal to mirror Docker images. Thank you for reading, and please do voice your thoughts!
OK my major concern with mirroring Docker images are the following:
- How many images?
Initially, dozens. Probably growing to the order of 100s.
- How large are these images?
Between tens and couple-of-hundreds of MB each.
- How often do they change?
Probably a lot -- they'll get rebuilt whenever there's a security update in any of the packages inside.
On 30 August 2016 at 13:10, Matthew Miller mattdm@fedoraproject.org wrote:
On Thu, Aug 25, 2016 at 12:17:06PM -0400, Stephen John Smoogen wrote:
Hello! Based on the responses I've received so far, some new information I learned about Docker's Manifest v2, and discussions in #fedora-releng, I would like to propose this second draft of the proposal to mirror Docker images. Thank you for reading, and please do voice your thoughts!
OK my major concern with mirroring Docker images are the following:
- How many images?
Initially, dozens. Probably growing to the order of 100s.
- How large are these images?
Between tens and couple-of-hundreds of MB each.
We will go with 48 images and 48 MB per image which will be 2304 MB which for an initial layout isn't too bad.
- How often do they change?
Probably a lot -- they'll get rebuilt whenever there's a security update in any of the packages inside.
Do the old ones need to be kept around for GPL and other compliance? How does one get the source code of what XYZ image is and XYZ+1 image?
This is where the disk storage becomes problematic and the mirroring harder. A lot of mirrors have a throughput of 1-5 Mb/s when combined with all the other people trying to mirror. Rawhide already overwhelms a bunch of them. We need to make sure that our 'update' path makes sense so that we meet a bunch of different things and make sure that people can actually sync.
On Tue, Aug 30, 2016 at 01:53:23PM -0400, Stephen John Smoogen wrote:
Do the old ones need to be kept around for GPL and other compliance?
No, as long as we have a way to trace back from a given image to its source components, which the system does, and as long as we don't throw those away, which we don't.
How does one get the source code of what XYZ image is and XYZ+1 image?
Handwavy something that Adam can probably explain. :)
This is where the disk storage becomes problematic and the mirroring harder. A lot of mirrors have a throughput of 1-5 Mb/s when combined with all the other people trying to mirror. Rawhide already overwhelms a bunch of them. We need to make sure that our 'update' path makes sense so that we meet a bunch of different things and make sure that people can actually sync.
Yeah.
On Tue, Aug 30, 2016 at 1:14 PM, Matthew Miller mattdm@fedoraproject.org wrote:
On Tue, Aug 30, 2016 at 01:53:23PM -0400, Stephen John Smoogen wrote:
Do the old ones need to be kept around for GPL and other compliance?
No, as long as we have a way to trace back from a given image to its source components, which the system does, and as long as we don't throw those away, which we don't.
How does one get the source code of what XYZ image is and XYZ+1 image?
Handwavy something that Adam can probably explain. :)
XYZ and XYZ+1 images will be a sum of their parts, the parts will all originate from Fedora and will all persist in Fedora just as they do today. This is why we're not allowing non-Fedora content into our Docker Layered Images because how to solve this with content originating from third-party sources has not yet been figured out.
Effectively, DistGit will maintain the Dockerfile and scripts git history, the builds with logs and content generator metadata imported manifest of the layered image contents will remain in koji. We will be able to reproduce each Docker Layered image if we ever had to since the content all originates from Fedora's dnf repos combined with koji and the docker namespace in DistGit. So for any given Layered Image and it's corresponding Docker tag (which in this case corresponds to it's version), we will be able to look back in koji+DistGit and find what is required to reproduce, then reproduce (if necessary).
-AdamM
This is where the disk storage becomes problematic and the mirroring harder. A lot of mirrors have a throughput of 1-5 Mb/s when combined with all the other people trying to mirror. Rawhide already overwhelms a bunch of them. We need to make sure that our 'update' path makes sense so that we meet a bunch of different things and make sure that people can actually sync.
Yeah.
-- Matthew Miller mattdm@fedoraproject.org Fedora Project Leader _______________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/infrastructure@lists.fedoraproje...
On Wed, Aug 24, 2016 at 06:25:15PM -0400, Randy Barlow wrote:
Mirror List
Users will be pointing their docker clients at Mirror List when they docker pull Fedora's Docker images. In order for this to work, we will need to make two changes to Mirror List so that it can respond to the docker client properly. The first change is that Mirror List will need to respond with a special header and a body of "{}" when the docker client sends a GET request for /v2/. The second change is that it will need to return a Docker Manifest schema 2 document containing a list of mirrors that have the requested blobs when the client makes additional requests, so that the clients can be retrieve the blobs from a list of mirrors near their locations, similar to how it does with the dnf client today.
The docker client typically connects to port 5000. We could run a second instance of Mirror List on port 5000 if we wanted to isolate it from the current instance. We can also have the docker client pull from 443 as dnf does if we want to keep the deployment simpler.
I am wondering if it would make sense to have a new mirrorlist-docker that would be different from the actual/current mirrorlist. It would allow easier modifications and evolutions w/o running into the risk of breaking the current mirrorlist.
New Tool
The last piece that is needed is a tool that can create the filesystem tree that we want to synchronize out to the mirrors. The mirrors only need to carry manifests and blobs, so the tool needs only to pull these documents out of the registry that Adam Miller has set up and write them to disk in a particular structure. For optimization, we could use hardlinks for blobs that are common across the various images (for example, the Fedora base blob will be the same in all images) to save rsync time and mirror disk space.
Additionally, we will need a playbook to run this new tool in response to fedmsgs. We may be able to use Adam Miller's loopabull project to run such a playbook at the right times.
Does loopabull work with our setup that relies sudo? (I still think we can do w/o but I won't fight if we want to do w/ :))
Pierre
On Mon, 2016-08-29 at 10:55 +0200, Pierre-Yves Chibon wrote:
I am wondering if it would make sense to have a new mirrorlist-docker that would be different from the actual/current mirrorlist. It would allow easier modifications and evolutions w/o running into the risk of breaking the current mirrorlist.
Hello Pierre-Yves!
Are you suggesting a new project, or a new deployment instance of the same project?
On Mon, Aug 29, 2016 at 3:55 AM, Pierre-Yves Chibon pingou@pingoured.fr wrote:
On Wed, Aug 24, 2016 at 06:25:15PM -0400, Randy Barlow wrote:
Mirror List
Users will be pointing their docker clients at Mirror List when they docker pull Fedora's Docker images. In order for this to work, we will need to make two changes to Mirror List so that it can respond to the docker client properly. The first change is that Mirror List will need to respond with a special header and a body of "{}" when the docker client sends a GET request for /v2/. The second change is that it will need to return a Docker Manifest schema 2 document containing a list of mirrors that have the requested blobs when the client makes additional requests, so that the clients can be retrieve the blobs from a list of mirrors near their locations, similar to how it does with the dnf client today.
The docker client typically connects to port 5000. We could run a second instance of Mirror List on port 5000 if we wanted to isolate it from the current instance. We can also have the docker client pull from 443 as dnf does if we want to keep the deployment simpler.
I am wondering if it would make sense to have a new mirrorlist-docker that would be different from the actual/current mirrorlist. It would allow easier modifications and evolutions w/o running into the risk of breaking the current mirrorlist.
New Tool
The last piece that is needed is a tool that can create the filesystem tree that we want to synchronize out to the mirrors. The mirrors only need to carry manifests and blobs, so the tool needs only to pull these documents out of the registry that Adam Miller has set up and write them to disk in a particular structure. For optimization, we could use hardlinks for blobs that are common across the various images (for example, the Fedora base blob will be the same in all images) to save rsync time and mirror disk space.
Additionally, we will need a playbook to run this new tool in response to fedmsgs. We may be able to use Adam Miller's loopabull project to run such a playbook at the right times.
Does loopabull work with our setup that relies sudo? (I still think we can do w/o but I won't fight if we want to do w/ :))
Not at present but I'm sure we could sort out adding that functionality to loopabull.
-AdamM
Pierre
infrastructure mailing list infrastructure@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/infrastructure@lists.fedoraproje...
On Wed, 2016-08-24 at 18:25 -0400, Randy Barlow wrote:
Signing
Patrick raised the question of signing. Docker supported signing within the manifest with the schema 1 version, but with schema 2 the embedded signatures have been removed in favor of the Notary service[5]. This may be an option for Fedora, or there may be alternatives we could look into if desired. I'm happy to dig more if we would prefer not to use Notary for some reason.
The Blobs (Docker calls the Image layers Blobs) themselves are not signed, but they are referenced by checksum in the Manifest. You can see an example Manifest at [6]. Thus, theoretically, if the user trusts the Manifest because it is signed by Fedora, they should be able to trust the Blob layers that they download so long as they do match the expected checksum. The Docker client does seem to check the checksum in my experience.
By the way, the Manifest response from Mirror List will include the expected checksums for the Blobs that the client is trying to pull. Thus, in addition to Fedora signing the Manifest itself, we can also have the client validate the checksums of the Blobs they receive from the mirror. If we ensure that clients always communicate with Mirror List over TLS, this will add another layer of validation for us.
I was thinking some more about this over the past few days, and I realized that signing the Manifest itself might be a problem since the Manifest would be dynamically generated under the current proposal (so that the urls of the Blobs can be set to mirrors near the requester). I would guess that we don't want sigul to have any connection to mirror list, so having automated signing here probably isn't ideal.
One thing that may work is to have the Manifest Lists be signed instead of the Manifests themselves. The Manifest Lists are the list of Manifests that are available, one per supported arch. During docker pull, we can give the client this list in response to the initial request. The list contains URLs for each available Manifest, referenced by checksum. Thus, if the initial list is signed, the client should make follow up requests for the Manifest by checksum (it's part of the URL) and should validate the checksum of the Manifest it receives. Thus, if we sign the Manfest list, we've signed the checksum of the Manifest, which references the Blobs by checksums as well.
How does all that sound? I really don't know much about the Notary service, or about how the Docker client uses it, so I'm not sure that the Mirror List can be signed or not. I'm not even sure that we want to go with Notary, but I did want to bring up this issue early on.
On Tue, 2016-09-06 at 13:59 -0400, Randy Barlow wrote:
One thing that may work is to have the Manifest Lists be signed instead of the Manifests themselves. The Manifest Lists are the list of Manifests that are available, one per supported arch. During docker pull, we can give the client this list in response to the initial request. The list contains URLs for each available Manifest, referenced by checksum. Thus, if the initial list is signed, the client should make follow up requests for the Manifest by checksum (it's part of the URL) and should validate the checksum of the Manifest it receives. Thus, if we sign the Manfest list, we've signed the checksum of the Manifest, which references the Blobs by checksums as well.
Aaaaaand I'm pretty sure we can't sign the Manifest Lists either since they would reference Manifests by digest, and the Manfest's digest will be dynamic due to the URL list changing in response to the requester. I feel silly for not realizing this when I proposed that last bit.
I think we need to go back to the drawing board on the signing problem.
I've documented this second draft on the wiki[0]. At this time we don't have signing figured out. Since signing could impact the plan, we are stuck on figuring that out before going further.
[0] https://fedoraproject.org/wiki/Changes/FedoraDockerRegistry
On Tue, Aug 16, 2016, at 11:33 AM, Randy Barlow wrote:
In summary, the proposal is to write a patch for the docker client that will give it the capability to accept metalink responses upon docker pull operations. We would also need to add support for Docker images to mirror list and mirror manager. Additionally, we will need a small tool to pull the content to be mirrored out of a docker registry and write them to disk in a format that can be mirrored, as well as some Ansible code to run the tool when there is new content to be mirrored.
Related to this, I think it'd be useful to target public IaaS (AWS, GCE, etc.) for inside-infra mirrors. Basically we want Fedora images to hit a S3 bucket in the region or equivalent by default for content. This is how Amazon configures Amazon Linux.
It seems like the kind of thing that we could ask Red Hat for sponsorship.
On Thu, Sep 01, 2016 at 10:49:15AM -0400, Colin Walters wrote:
On Tue, Aug 16, 2016, at 11:33 AM, Randy Barlow wrote:
In summary, the proposal is to write a patch for the docker client that will give it the capability to accept metalink responses upon docker pull operations. We would also need to add support for Docker images to mirror list and mirror manager. Additionally, we will need a small tool to pull the content to be mirrored out of a docker registry and write them to disk in a format that can be mirrored, as well as some Ansible code to run the tool when there is new content to be mirrored.
Related to this, I think it'd be useful to target public IaaS (AWS, GCE, etc.) for inside-infra mirrors. Basically we want Fedora images to hit a S3 bucket in the region or equivalent by default for content. This is how Amazon configures Amazon Linux.
It seems like the kind of thing that we could ask Red Hat for sponsorship.
This already happens for EPEL:
$ curl -s "https://mirrors.fedoraproject.org/mirrorlist?repo=epel-7&arch=x86_64&..." | head -2 # repo = epel-7 arch = x86_64 Using preferred netblock Using Internet2 country = US country = CA http://s3-mirror-us-east-1.fedoraproject.org/pub/epel/7/x86_64/
We are currently syncing EPEL to the following S3 regions:
s3-mirror-us-east-1 s3-mirror-us-west-1 s3-mirror-us-west-2 s3-mirror-eu-west-1 s3-mirror-ap-northeast-1
So, it sounds doable for an additional target (like docker images).
Adrian
On Thu, 1 Sep 2016 17:24:50 +0200 Adrian Reber adrian@lisas.de wrote:
This already happens for EPEL:
$ curl -s "https://mirrors.fedoraproject.org/mirrorlist?repo=epel-7&arch=x86_64&..." | head -2 # repo = epel-7 arch = x86_64 Using preferred netblock Using Internet2 country = US country = CA http://s3-mirror-us-east-1.fedoraproject.org/pub/epel/7/x86_64/
We are currently syncing EPEL to the following S3 regions:
s3-mirror-us-east-1 s3-mirror-us-west-1 s3-mirror-us-west-2 s3-mirror-eu-west-1 s3-mirror-ap-northeast-1
So, it sounds doable for an additional target (like docker images).
Note that we used to do this for Fedora releases as well, but upon looking they were really almost never used, so we stopped doing them. (That was before I think we had much of an official cloud image tho, so perhaps we could revisit).
If we do setup this for docker images, we should make sure and monitor it and see that it's used/worthwhile. Or see first and add if it seems like it would be worthwhile.
kevin
On Thu, Sep 01, 2016 at 10:49:15AM -0400, Colin Walters wrote:
pull operations. We would also need to add support for Docker images to mirror list and mirror manager. Additionally, we will need a small tool
Related to this, I think it'd be useful to target public IaaS (AWS, GCE, etc.) for inside-infra mirrors. Basically we want Fedora images to hit a S3 bucket in the region or equivalent by default for content. This is how Amazon configures Amazon Linux.
MirrorManager should be able to handle this, provided that it's configured properly (and that the mirrors are there, of course).
On Thu, Sep 1, 2016, at 10:49 AM, Colin Walters wrote:
Related to this, I think it'd be useful to target public IaaS (AWS, GCE, etc.) for inside-infra mirrors. Basically we want Fedora images to hit a S3 bucket in the region or equivalent by default for content. This is how Amazon configures Amazon Linux.
It seems like the kind of thing that we could ask Red Hat for sponsorship.
I was doing some S3 stuff the other day, and I noticed that it has a feature:
https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html
So we could look at uploading the content this way. Then, I think anyone who wants to consume it would "mirror" it privately into their own buckets, without paying the outbound network traffic costs.
Alternatively, they could set up the client program to provide the requisite auth headers, but that seems likely harder. A middle ground here would be setting up a proxy that does the auth.
I haven't tested this, but it's something to potentially look at.
On jueves, 1 de septiembre de 2016 10:49:15 AM CDT Colin Walters wrote:
On Tue, Aug 16, 2016, at 11:33 AM, Randy Barlow wrote:
In summary, the proposal is to write a patch for the docker client that will give it the capability to accept metalink responses upon docker pull operations. We would also need to add support for Docker images to mirror list and mirror manager. Additionally, we will need a small tool to pull the content to be mirrored out of a docker registry and write them to disk in a format that can be mirrored, as well as some Ansible code to run the tool when there is new content to be mirrored.
Related to this, I think it'd be useful to target public IaaS (AWS, GCE, etc.) for inside-infra mirrors. Basically we want Fedora images to hit a S3 bucket in the region or equivalent by default for content. This is how Amazon configures Amazon Linux.
It seems like the kind of thing that we could ask Red Hat for sponsorship.
this would all assume that we get legals signoff to provide the content into the different services. we are still waiting on okay to provide cloud images everywhere except for AWS
Dennis
infrastructure@lists.fedoraproject.org