From: Mathieu Bridon bochecha@daitauha.fr
Currently, the CGI script is set to upload files: - to the old path if the upload uses md5 - to the new path if the upload uses sha512
The old path is as follows: /%(srpmname)s/%(filename)s/%(hash)s/%(filename)s
The new path is: /%(srpmname)s/%(filename)s/%(hashtype)s/%(hash)s/%(filename)s
This was meant to ensure compatibility with current fedpkg which always downloads from the old path, but will eventually download from the new path when we move to sha512.
However, working more on this, I now think it would make for a smoother transition if we instead always stored the files at the new path, but just hardlinked to the old path if the upload is using md5.
This is what this patch achieves.
With this deployed in production, fedpkg could be patched to try downloading from the new path, and fallback to the old one if necessary, which decouples the migration to the new path from the migration to the new hash. --- roles/distgit/files/dist-git-upload.cgi | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/roles/distgit/files/dist-git-upload.cgi b/roles/distgit/files/dist-git-upload.cgi index b4fda74..38c40db 100644 --- a/roles/distgit/files/dist-git-upload.cgi +++ b/roles/distgit/files/dist-git-upload.cgi @@ -112,11 +112,6 @@ def main(): hash_dir = os.path.join(module_dir, filename, hash_type, checksum) msgpath = os.path.join(name, module_dir, filename, hash_type, checksum, filename)
- if hash_type == "md5": - # Preserve compatibility with the current folder hierarchy for md5 - hash_dir = os.path.join(module_dir, filename, checksum) - msgpath = os.path.join(name, module_dir, filename, checksum, filename) - unwanted_prefix = '/srv/cache/lookaside/pkgs/' if msgpath.startswith(unwanted_prefix): msgpath = msgpath[len(unwanted_prefix):] @@ -180,6 +175,11 @@ def main(): print >> sys.stderr, '[username=%s] Stored %s (%d bytes)' % (username, dest_file, filesize) print 'File %s size %d %s %s stored OK' % (filename, filesize, hash_type.upper(), checksum)
+ # Add the file to the old path, where fedpkg is currently looking for it + if hash_type == "md5": + old_path = os.path.join(module_dir, filename, checksum, filename) + os.link(dest_file, old_path) + # Emit a fedmsg message. Load the config to talk to the fedmsg-relay. try: config = fedmsg.config.load_config([], None)
On Thu, May 28, 2015 at 02:05:44PM +0200, Mathieu Bridon wrote:
From: Mathieu Bridon bochecha@daitauha.fr
Currently, the CGI script is set to upload files:
- to the old path if the upload uses md5
- to the new path if the upload uses sha512
The old path is as follows: /%(srpmname)s/%(filename)s/%(hash)s/%(filename)s
The new path is: /%(srpmname)s/%(filename)s/%(hashtype)s/%(hash)s/%(filename)s
This was meant to ensure compatibility with current fedpkg which always downloads from the old path, but will eventually download from the new path when we move to sha512.
However, working more on this, I now think it would make for a smoother transition if we instead always stored the files at the new path, but just hardlinked to the old path if the upload is using md5.
This is what this patch achieves.
With this deployed in production, fedpkg could be patched to try downloading from the new path, and fallback to the old one if necessary, which decouples the migration to the new path from the migration to the new hash.
:+1: for me :)
Pierre
roles/distgit/files/dist-git-upload.cgi | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/roles/distgit/files/dist-git-upload.cgi b/roles/distgit/files/dist-git-upload.cgi index b4fda74..38c40db 100644 --- a/roles/distgit/files/dist-git-upload.cgi +++ b/roles/distgit/files/dist-git-upload.cgi @@ -112,11 +112,6 @@ def main(): hash_dir = os.path.join(module_dir, filename, hash_type, checksum) msgpath = os.path.join(name, module_dir, filename, hash_type, checksum, filename)
- if hash_type == "md5":
# Preserve compatibility with the current folder hierarchy for md5
hash_dir = os.path.join(module_dir, filename, checksum)
msgpath = os.path.join(name, module_dir, filename, checksum, filename)
- unwanted_prefix = '/srv/cache/lookaside/pkgs/' if msgpath.startswith(unwanted_prefix): msgpath = msgpath[len(unwanted_prefix):]
@@ -180,6 +175,11 @@ def main(): print >> sys.stderr, '[username=%s] Stored %s (%d bytes)' % (username, dest_file, filesize) print 'File %s size %d %s %s stored OK' % (filename, filesize, hash_type.upper(), checksum)
- # Add the file to the old path, where fedpkg is currently looking for it
- if hash_type == "md5":
old_path = os.path.join(module_dir, filename, checksum, filename)
os.link(dest_file, old_path)
- # Emit a fedmsg message. Load the config to talk to the fedmsg-relay. try: config = fedmsg.config.load_config([], None)
-- 2.1.0
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
On Thu, May 28, 2015 at 2:11 PM Mathieu Bridon bochecha@fedoraproject.org wrote:
From: Mathieu Bridon bochecha@daitauha.fr
Currently, the CGI script is set to upload files:
- to the old path if the upload uses md5
- to the new path if the upload uses sha512
The old path is as follows: /%(srpmname)s/%(filename)s/%(hash)s/%(filename)s
The new path is: /%(srpmname)s/%(filename)s/%(hashtype)s/%(hash)s/%(filename)s
This was meant to ensure compatibility with current fedpkg which always downloads from the old path, but will eventually download from the new path when we move to sha512.
However, working more on this, I now think it would make for a smoother transition if we instead always stored the files at the new path, but just hardlinked to the old path if the upload is using md5.
This is what this patch achieves.
With this deployed in production, fedpkg could be patched to try downloading from the new path, and fallback to the old one if necessary, which decouples the migration to the new path from the migration to the new hash.
roles/distgit/files/dist-git-upload.cgi | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/roles/distgit/files/dist-git-upload.cgi b/roles/distgit/files/dist-git-upload.cgi index b4fda74..38c40db 100644 --- a/roles/distgit/files/dist-git-upload.cgi +++ b/roles/distgit/files/dist-git-upload.cgi @@ -112,11 +112,6 @@ def main(): hash_dir = os.path.join(module_dir, filename, hash_type, checksum) msgpath = os.path.join(name, module_dir, filename, hash_type, checksum, filename)
- if hash_type == "md5":
# Preserve compatibility with the current folder hierarchy for md5
hash_dir = os.path.join(module_dir, filename, checksum)
msgpath = os.path.join(name, module_dir, filename, checksum,
filename)
- unwanted_prefix = '/srv/cache/lookaside/pkgs/' if msgpath.startswith(unwanted_prefix): msgpath = msgpath[len(unwanted_prefix):]
@@ -180,6 +175,11 @@ def main(): print >> sys.stderr, '[username=%s] Stored %s (%d bytes)' % (username, dest_file, filesize) print 'File %s size %d %s %s stored OK' % (filename, filesize, hash_type.upper(), checksum)
- # Add the file to the old path, where fedpkg is currently looking for
it
- if hash_type == "md5":
old_path = os.path.join(module_dir, filename, checksum, filename)
os.link(dest_file, old_path)
- # Emit a fedmsg message. Load the config to talk to the fedmsg-relay. try: config = fedmsg.config.load_config([], None)
--
+1
On Thursday, May 28, 2015 02:05:44 PM Mathieu Bridon wrote:
From: Mathieu Bridon bochecha@daitauha.fr
Currently, the CGI script is set to upload files:
- to the old path if the upload uses md5
- to the new path if the upload uses sha512
The old path is as follows: /%(srpmname)s/%(filename)s/%(hash)s/%(filename)s
The new path is: /%(srpmname)s/%(filename)s/%(hashtype)s/%(hash)s/%(filename)s
This was meant to ensure compatibility with current fedpkg which always downloads from the old path, but will eventually download from the new path when we move to sha512.
However, working more on this, I now think it would make for a smoother transition if we instead always stored the files at the new path, but just hardlinked to the old path if the upload is using md5.
This is what this patch achieves.
With this deployed in production, fedpkg could be patched to try downloading from the new path, and fallback to the old one if necessary, which decouples the migration to the new path from the migration to the new hash.
roles/distgit/files/dist-git-upload.cgi | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/roles/distgit/files/dist-git-upload.cgi b/roles/distgit/files/dist-git-upload.cgi index b4fda74..38c40db 100644 --- a/roles/distgit/files/dist-git-upload.cgi +++ b/roles/distgit/files/dist-git-upload.cgi @@ -112,11 +112,6 @@ def main(): hash_dir = os.path.join(module_dir, filename, hash_type, checksum) msgpath = os.path.join(name, module_dir, filename, hash_type, checksum, filename)
- if hash_type == "md5":
# Preserve compatibility with the current folder hierarchy for md5
hash_dir = os.path.join(module_dir, filename, checksum)
msgpath = os.path.join(name, module_dir, filename, checksum,
filename) - unwanted_prefix = '/srv/cache/lookaside/pkgs/' if msgpath.startswith(unwanted_prefix): msgpath = msgpath[len(unwanted_prefix):] @@ -180,6 +175,11 @@ def main(): print >> sys.stderr, '[username=%s] Stored %s (%d bytes)' % (username, dest_file, filesize) print 'File %s size %d %s %s stored OK' % (filename, filesize, hash_type.upper(), checksum)
- # Add the file to the old path, where fedpkg is currently looking for
it + if hash_type == "md5":
old_path = os.path.join(module_dir, filename, checksum, filename)
os.link(dest_file, old_path)
- # Emit a fedmsg message. Load the config to talk to the fedmsg-relay. try: config = fedmsg.config.load_config([], None)
The idea is fine, but it is a bit hard to tell from this patch what exactly is going on as there is no context.
Dennis
On Thu, May 28, 2015 at 10:36:54AM -0500, Dennis Gilmore wrote:
On Thursday, May 28, 2015 02:05:44 PM Mathieu Bridon wrote:
From: Mathieu Bridon bochecha@daitauha.fr
Currently, the CGI script is set to upload files:
- to the old path if the upload uses md5
- to the new path if the upload uses sha512
The old path is as follows: /%(srpmname)s/%(filename)s/%(hash)s/%(filename)s
The new path is: /%(srpmname)s/%(filename)s/%(hashtype)s/%(hash)s/%(filename)s
This was meant to ensure compatibility with current fedpkg which always downloads from the old path, but will eventually download from the new path when we move to sha512.
However, working more on this, I now think it would make for a smoother transition if we instead always stored the files at the new path, but just hardlinked to the old path if the upload is using md5.
This is what this patch achieves.
With this deployed in production, fedpkg could be patched to try downloading from the new path, and fallback to the old one if necessary, which decouples the migration to the new path from the migration to the new hash.
roles/distgit/files/dist-git-upload.cgi | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/roles/distgit/files/dist-git-upload.cgi b/roles/distgit/files/dist-git-upload.cgi index b4fda74..38c40db 100644 --- a/roles/distgit/files/dist-git-upload.cgi +++ b/roles/distgit/files/dist-git-upload.cgi @@ -112,11 +112,6 @@ def main(): hash_dir = os.path.join(module_dir, filename, hash_type, checksum) msgpath = os.path.join(name, module_dir, filename, hash_type, checksum, filename)
- if hash_type == "md5":
# Preserve compatibility with the current folder hierarchy for md5
hash_dir = os.path.join(module_dir, filename, checksum)
msgpath = os.path.join(name, module_dir, filename, checksum,
filename) - unwanted_prefix = '/srv/cache/lookaside/pkgs/' if msgpath.startswith(unwanted_prefix): msgpath = msgpath[len(unwanted_prefix):] @@ -180,6 +175,11 @@ def main(): print >> sys.stderr, '[username=%s] Stored %s (%d bytes)' % (username, dest_file, filesize) print 'File %s size %d %s %s stored OK' % (filename, filesize, hash_type.upper(), checksum)
- # Add the file to the old path, where fedpkg is currently looking for
it + if hash_type == "md5":
old_path = os.path.join(module_dir, filename, checksum, filename)
os.link(dest_file, old_path)
- # Emit a fedmsg message. Load the config to talk to the fedmsg-relay. try: config = fedmsg.config.load_config([], None)
The idea is fine, but it is a bit hard to tell from this patch what exactly is going on as there is no context.
You mean in the code? I find the commit message to be pretty explanatory. What are you looking for?
Pierre
On Thursday, May 28, 2015 05:52:53 PM Pierre-Yves Chibon wrote:
On Thu, May 28, 2015 at 10:36:54AM -0500, Dennis Gilmore wrote:
On Thursday, May 28, 2015 02:05:44 PM Mathieu Bridon wrote:
From: Mathieu Bridon bochecha@daitauha.fr
Currently, the CGI script is set to upload files:
- to the old path if the upload uses md5
- to the new path if the upload uses sha512
The old path is as follows: /%(srpmname)s/%(filename)s/%(hash)s/%(filename)s
The new path is: /%(srpmname)s/%(filename)s/%(hashtype)s/%(hash)s/%(filename)s
This was meant to ensure compatibility with current fedpkg which always downloads from the old path, but will eventually download from the new path when we move to sha512.
However, working more on this, I now think it would make for a smoother transition if we instead always stored the files at the new path, but just hardlinked to the old path if the upload is using md5.
This is what this patch achieves.
With this deployed in production, fedpkg could be patched to try downloading from the new path, and fallback to the old one if necessary, which decouples the migration to the new path from the migration to the new hash.
roles/distgit/files/dist-git-upload.cgi | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/roles/distgit/files/dist-git-upload.cgi b/roles/distgit/files/dist-git-upload.cgi index b4fda74..38c40db 100644 --- a/roles/distgit/files/dist-git-upload.cgi +++ b/roles/distgit/files/dist-git-upload.cgi
@@ -112,11 +112,6 @@ def main(): hash_dir = os.path.join(module_dir, filename, hash_type, checksum) msgpath = os.path.join(name, module_dir, filename, hash_type, checksum,
filename)
- if hash_type == "md5":
# Preserve compatibility with the current folder hierarchy for
md5
hash_dir = os.path.join(module_dir, filename, checksum)
msgpath = os.path.join(name, module_dir, filename, checksum,
filename) -
unwanted_prefix = '/srv/cache/lookaside/pkgs/' if msgpath.startswith(unwanted_prefix): msgpath = msgpath[len(unwanted_prefix):]
@@ -180,6 +175,11 @@ def main(): print >> sys.stderr, '[username=%s] Stored %s (%d bytes)' % (username,
dest_file, filesize) print 'File %s size %d %s %s stored OK' % (filename, filesize, hash_type.upper(), checksum)
- # Add the file to the old path, where fedpkg is currently looking
for it + if hash_type == "md5":
old_path = os.path.join(module_dir, filename, checksum,
filename)
os.link(dest_file, old_path)
# Emit a fedmsg message. Load the config to talk to the fedmsg-relay.
try: config = fedmsg.config.load_config([], None)
The idea is fine, but it is a bit hard to tell from this patch what exactly is going on as there is no context.
You mean in the code? I find the commit message to be pretty explanatory. What are you looking for?
Yeah in the code.
Dennis
On Thu, 2015-05-28 at 14:17 -0500, Dennis Gilmore wrote:
On Thursday, May 28, 2015 05:52:53 PM Pierre-Yves Chibon wrote:
On Thu, May 28, 2015 at 10:36:54AM -0500, Dennis Gilmore wrote:
On Thursday, May 28, 2015 02:05:44 PM Mathieu Bridon wrote:
From: Mathieu Bridon bochecha@daitauha.fr
Currently, the CGI script is set to upload files:
- to the old path if the upload uses md5
- to the new path if the upload uses sha512
The old path is as follows: /%(srpmname)s/%(filename)s/%(hash)s/%(filename)s
The new path is: /%(srpmname)s/%(filename)s/%(hashtype)s/%(hash)s/%(filename)s
This was meant to ensure compatibility with current fedpkg which always downloads from the old path, but will eventually download from the new path when we move to sha512.
However, working more on this, I now think it would make for a smoother transition if we instead always stored the files at the new path, but just hardlinked to the old path if the upload is using md5.
This is what this patch achieves.
With this deployed in production, fedpkg could be patched to try downloading from the new path, and fallback to the old one if necessary, which decouples the migration to the new path from the migration to the new hash.
roles/distgit/files/dist-git-upload.cgi | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/roles/distgit/files/dist-git-upload.cgi b/roles/distgit/files/dist-git-upload.cgi index b4fda74..38c40db 100644 --- a/roles/distgit/files/dist-git-upload.cgi +++ b/roles/distgit/files/dist-git-upload.cgi
@@ -112,11 +112,6 @@ def main(): hash_dir = os.path.join(module_dir, filename, hash_type, checksum) msgpath = os.path.join(name, module_dir, filename, hash_type, checksum,
filename)
- if hash_type == "md5":
# Preserve compatibility with the current folder hierarchy for
md5
hash_dir = os.path.join(module_dir, filename, checksum)
msgpath = os.path.join(name, module_dir, filename, checksum,
filename) -
unwanted_prefix = '/srv/cache/lookaside/pkgs/' if msgpath.startswith(unwanted_prefix): msgpath = msgpath[len(unwanted_prefix):]
@@ -180,6 +175,11 @@ def main(): print >> sys.stderr, '[username=%s] Stored %s (%d bytes)' % (username,
dest_file, filesize) print 'File %s size %d %s %s stored OK' % (filename, filesize, hash_type.upper(), checksum)
- # Add the file to the old path, where fedpkg is currently looking
for it + if hash_type == "md5":
old_path = os.path.join(module_dir, filename, checksum,
filename)
os.link(dest_file, old_path)
# Emit a fedmsg message. Load the config to talk to the fedmsg-relay.
try: config = fedmsg.config.load_config([], None)
The idea is fine, but it is a bit hard to tell from this patch what exactly is going on as there is no context.
You mean in the code? I find the commit message to be pretty explanatory. What are you looking for?
Yeah in the code.
Overall, the whole of the code of this CGI script is hard to read and understand.
What I'm doing in this patch is quite simple, though, and explained in the commit message. (and that's what commit messages are for, after all)
I'd be happy to replace the CGI script altogether by something else (how about replacing distgit by pagure, and just storing the tarballs inside pagure somewhere?), but that's a very different discussion. :)
Right now, all I want is make progress on the move away from MD5, and this patch helps with that.
Given that I received two +1, and that you agreed to the general idea, I've pushed it, and Pierre-Yves is helping me run the playbook to test it in staging.
On Fri, 2015-05-29 at 11:32 +0200, Mathieu Bridon wrote:
Given that I received two +1, and that you agreed to the general idea, I've pushed it, and Pierre-Yves is helping me run the playbook to test it in staging.
And it's good we did, because there was a problem (see my followup patch pushed to ansible), it's now working just fine in testing:
On the server:
# find /srv/cache/lookaside/pkgs/libcangjie/
On the client:
$ fedpkgstg new-sources libcangjie-1.3.tar.xz Uploading: libcangjie-1.3.tar.xz ######################################################################## 100.0% Source upload succeeded. Don't forget to commit the sources file
Back on the server:
# find /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5/e50ed193b0e82b07d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5/e50ed193b0e82b07d2d32ee6e62720b9/libcangjie-1.3.tar.xz /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07d2d32ee6e62720b9/libcangjie-1.3.tar.xz
So now, in staging, when uploading a new source file, it is uploaded in both location.
If the file only exists in the old location:
On the server:
# find /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07d2d32ee6e62720b9/libcangjie-1.3.tar.xz
On the client:
$ fedpkgstg new-sources libcangjie-1.3.tar.xz Uploading: libcangjie-1.3.tar.xz ######################################################################## 100.0% Source upload succeeded. Don't forget to commit the sources file
Back on the server:
# find /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07d2d32ee6e62720b9/libcangjie-1.3.tar.xz /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5/e50ed193b0e82b07d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5/e50ed193b0e82b07d2d32ee6e62720b9/libcangjie-1.3.tar.xz
The file just gets reuploaded, and hardlinked over, which is just fine. (they have the same hash after all, so they are expected to be identical anyway)
If everybody agrees this is the desired behaviour, can we deploy it in production?
This would help a lot with continuing the migration, as it essentially means we don't need a "flag day" during which we cut uploads, run a big script to hardlink all the existing sources to the new path, then allow uploads again.
We'd still need to run such a script, but only for old tarballs, uploads of new sources could continue uninterrupted just fine.
On Fri, May 29, 2015 at 12:43:32PM +0200, Mathieu Bridon wrote:
The file just gets reuploaded, and hardlinked over, which is just fine. (they have the same hash after all, so they are expected to be identical anyway)
If everybody agrees this is the desired behaviour, can we deploy it in production?
Can we avoid the reupload? Hopefully it won't hit too many people (because they'll remember when they have a tarball already uploaded) but it can be painful to wait for something to upload again if it's not needed.
-Toshio
On Fri, 2015-05-29 at 11:56 -0700, Toshio Kuratomi wrote:
On Fri, May 29, 2015 at 12:43:32PM +0200, Mathieu Bridon wrote:
If everybody agrees this is the desired behaviour, can we deploy it in production?
This had been deployed in production, which took me a bit by surprise as this discussion didn't seem to have been finished.
It caused a bit of a stir due to SELinux (it was disabled in staging when I tested those changes), but we fixed it with Pierre-Yves:
https://fedorahosted.org/rel-eng/ticket/6191
Can we avoid the reupload? Hopefully it won't hit too many people (because they'll remember when they have a tarball already uploaded) but it can be painful to wait for something to upload again if it's not needed.
The same script handles both:
* checking if a file exists * uploading a file
When it's called with the upload, we can't do much in the script to avoid a reupload, it's just going to be reuploaded, and as such the behaviour I described in my previous email will ensure we end up with the two copies linked to each other.
However, we could avoid the reupload in normal cases (i.e "fedpkg upload" and "fedpkg new-sources") because pyrpkg always checks if the file exists (calling the CGI script) before uploading it.
So we could make the "check" portion of the code verify if the file is present either in the old or new location, and if it's found only in the old one, symlink it to the new one.
If that makes sense, I'll cook up a patch (and test it with SELinux enforcing in staging, this time ;) )
On Jun 4, 2015 5:35 AM, "Mathieu Bridon" bochecha@fedoraproject.org wrote:
On Fri, 2015-05-29 at 11:56 -0700, Toshio Kuratomi wrote:
On Fri, May 29, 2015 at 12:43:32PM +0200, Mathieu Bridon wrote:
If everybody agrees this is the desired behaviour, can we deploy it in production?
This had been deployed in production, which took me a bit by surprise as this discussion didn't seem to have been finished.
It caused a bit of a stir due to SELinux (it was disabled in staging when I tested those changes), but we fixed it with Pierre-Yves:
https://fedorahosted.org/rel-eng/ticket/6191
Can we avoid the reupload? Hopefully it won't hit too many people (because they'll remember when they have a tarball already uploaded) but it can be painful to wait for something to upload again if it's not needed.
The same script handles both:
- checking if a file exists
- uploading a file
When it's called with the upload, we can't do much in the script to avoid a reupload, it's just going to be reuploaded, and as such the behaviour I described in my previous email will ensure we end up with the two copies linked to each other.
However, we could avoid the reupload in normal cases (i.e "fedpkg upload" and "fedpkg new-sources") because pyrpkg always checks if the file exists (calling the CGI script) before uploading it.
So we could make the "check" portion of the code verify if the file is present either in the old or new location, and if it's found only in the old one, symlink it to the new one.
If that makes sense, I'll cook up a patch (and test it with SELinux enforcing in staging, this time ;) )
Yep, that sounds like a good plan!
-Toshio
On Thu, 04 Jun 2015 14:35:42 +0200 Mathieu Bridon bochecha@fedoraproject.org wrote:
On Fri, 2015-05-29 at 11:56 -0700, Toshio Kuratomi wrote:
On Fri, May 29, 2015 at 12:43:32PM +0200, Mathieu Bridon wrote:
If everybody agrees this is the desired behaviour, can we deploy it in production?
This had been deployed in production, which took me a bit by surprise as this discussion didn't seem to have been finished.
Sorry, this was likely caused by my running the master playbook over all hosts after our reboots.
However, the expectation should be that when you commit something to ansible git you expect it to be pushed out and live. If you don't want something live, do not yet push it into git, or make sure if you want only staging to add a 'when' for that.
It caused a bit of a stir due to SELinux (it was disabled in staging when I tested those changes), but we fixed it with Pierre-Yves:
https://fedorahosted.org/rel-eng/ticket/6191
Thanks for the quick work on that. Appreciated. ;)
Can we avoid the reupload? Hopefully it won't hit too many people (because they'll remember when they have a tarball already uploaded) but it can be painful to wait for something to upload again if it's not needed.
The same script handles both:
- checking if a file exists
- uploading a file
When it's called with the upload, we can't do much in the script to avoid a reupload, it's just going to be reuploaded, and as such the behaviour I described in my previous email will ensure we end up with the two copies linked to each other.
However, we could avoid the reupload in normal cases (i.e "fedpkg upload" and "fedpkg new-sources") because pyrpkg always checks if the file exists (calling the CGI script) before uploading it.
So we could make the "check" portion of the code verify if the file is present either in the old or new location, and if it's found only in the old one, symlink it to the new one.
If that makes sense, I'll cook up a patch (and test it with SELinux enforcing in staging, this time ;) )
Thanks,
kevin
On Thu, 2015-06-04 at 09:38 -0600, Kevin Fenzi wrote:
On Thu, 04 Jun 2015 14:35:42 +0200 Mathieu Bridon bochecha@fedoraproject.org wrote:
On Fri, 2015-05-29 at 11:56 -0700, Toshio Kuratomi wrote:
On Fri, May 29, 2015 at 12:43:32PM +0200, Mathieu Bridon wrote:
If everybody agrees this is the desired behaviour, can we deploy it in production?
This had been deployed in production, which took me a bit by surprise as this discussion didn't seem to have been finished.
Sorry, this was likely caused by my running the master playbook over all hosts after our reboots.
However, the expectation should be that when you commit something to ansible git you expect it to be pushed out and live. If you don't want something live, do not yet push it into git, or make sure if you want only staging to add a 'when' for that.
Understood.
In the case of files like this script, what is the best way to handle them?
Maybe a new dist-git-upload-staging.cgi, and in the playbook conditionalize which file to use with a "when"?
It caused a bit of a stir due to SELinux (it was disabled in staging when I tested those changes), but we fixed it with Pierre-Yves:
https://fedorahosted.org/rel-eng/ticket/6191
Thanks for the quick work on that. Appreciated. ;)
I broke it, I get it to fix it. :)
On Friday, May 29, 2015 12:43:32 PM Mathieu Bridon wrote:
On Fri, 2015-05-29 at 11:32 +0200, Mathieu Bridon wrote:
Given that I received two +1, and that you agreed to the general idea, I've pushed it, and Pierre-Yves is helping me run the playbook to test it in staging.
And it's good we did, because there was a problem (see my followup patch pushed to ansible), it's now working just fine in testing:
On the server:
# find /srv/cache/lookaside/pkgs/libcangjie/
On the client:
$ fedpkgstg new-sources libcangjie-1.3.tar.xz Uploading: libcangjie-1.3.tar.xz ########################################################################
100.0% Source upload succeeded. Don't forget to commit the sources file
Back on the server:
# find /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5
/srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5/e50ed193b0e8 2b07d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5/e50ed193b0e8 2b07d2d32ee6e62720b9/libcangjie-1.3.tar.xz /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07 d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07 d2d32ee6e62720b9/libcangjie-1.3.tar.xz
So now, in staging, when uploading a new source file, it is uploaded in both location.
If the file only exists in the old location:
On the server:
# find /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz
/srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07 d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07 d2d32ee6e62720b9/libcangjie-1.3.tar.xz
On the client:
$ fedpkgstg new-sources libcangjie-1.3.tar.xz Uploading: libcangjie-1.3.tar.xz ########################################################################
100.0% Source upload succeeded. Don't forget to commit the sources file
Back on the server:
# find /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/ /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz
/srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07 d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/e50ed193b0e82b07 d2d32ee6e62720b9/libcangjie-1.3.tar.xz /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5/e50ed193b0e8 2b07d2d32ee6e62720b9 /srv/cache/lookaside/pkgs/libcangjie/libcangjie-1.3.tar.xz/md5/e50ed193b0e8 2b07d2d32ee6e62720b9/libcangjie-1.3.tar.xz
I think we should be linking to the old location and a sha512sum location not md5 but the general idea is okay
Dennis
On Wed, 2015-06-10 at 10:33 -0500, Dennis Gilmore wrote:
I think we should be linking to the old location and a sha512sum location not md5 but the general idea is okay
Doing the new location for md5 as well has a pretty big benefit though: it means we can update fedpkg to make it download from the new location independently from the move to sha512.
Decoupling the two makes for an easier migration:
- On the server, new uploads are stored in both locations **right now** (the patches we were discussing in this thread have all been merged and deployed to prod) - I have the fedpkg patches to change the download location, I just need to submit them. Also, as I have written it, fedpkg will try the new location first and fallback to the old one if needed.
So as soon as I send those fedpkg patches (still need a bit of time), we can release a new fedpkg with them, and the migration to the new path will effectively be done.
At that point, for an source file which had only be uploaded **before** we changed the upload.cgi script, we'd get this:
$ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0% Download failed, falling back to the old URL Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
But for a source file uploaded **after** the changes to the upload.cgi scrip, we'd get this:
$ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
Which means at that point, the migration to the new path would be pretty much done.
<optional step> I have a script ready, to be run on the server, which will hardlink all existing uploads from their old to their new path.
We could run it to get all files in their new path with md5, which would remove the warning from the above "fedpkg sources" output even for old uploads, but that's a cosmetic issue, so it's not even entirely required. </optional step>
And then, I have patches ready for fedpkg that I'll submit at that point, which just switch to sha512 and the new 'sources' file format. (the BSD-style one, which contains the hashtype)
We'd push that out, and it would be enough to migrate to sha512: - fedpkg would upload new source files with sha512, they'd appear only in their new path - fedpkg would know what hashtype to use for downloading, based on the 'sources' file, so it would still download just fine old uploads with md5.
<optional step> And if we really want to completely get rid of all md5, then we could run a script on the server-side to get the files in their new path for sha512 as well, even for old uploads.
And then we'd push an update to fedpkg which would ask package maintainers to run a "fedpkg new-sources" on their source files if they are still using md5 hashes, so that all the 'sources' files in distgit will end up pointing to sha512 hashes. </optional step>
----
All in all, that makes for a pretty smooth migration, with no flag day or outage during which we'd need to cut uploads temporarily.
Which is why the upload.cgi script now puts all files in their new path, even for md5 uploads. :)
On Thursday, June 11, 2015 04:59:22 PM Mathieu Bridon wrote:
On Wed, 2015-06-10 at 10:33 -0500, Dennis Gilmore wrote:
I think we should be linking to the old location and a sha512sum location not md5 but the general idea is okay
Doing the new location for md5 as well has a pretty big benefit though: it means we can update fedpkg to make it download from the new location independently from the move to sha512.
Decoupling the two makes for an easier migration:
I strongly want a flag day for the migration. because there is other things that need to be coupled with it. such as the changes to ca setup to use our ca for logins but a well known ca for koji.fp.o I want to decouple thekoji hub and web interface also, which will mean we need a new hostname for the hub. all of these changes have a hard requirement of a flag day as end users will need to make changes on the client side.
- On the server, new uploads are stored in both locations **right now**
(the patches we were discussing in this thread have all been merged and deployed to prod)
- I have the fedpkg patches to change the download location, I just
need to submit them. Also, as I have written it, fedpkg will try the new location first and fallback to the old one if needed.
So as soon as I send those fedpkg patches (still need a bit of time), we can release a new fedpkg with them, and the migration to the new path will effectively be done.
At that point, for an source file which had only be uploaded **before** we changed the upload.cgi script, we'd get this:
$ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0% Download failed, falling back to the old URL Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
But for a source file uploaded **after** the changes to the upload.cgi scrip, we'd get this:
$ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
Which means at that point, the migration to the new path would be pretty much done.
<optional step> I have a script ready, to be run on the server, which will hardlink all existing uploads from their old to their new path.
this is not optional. we have to link all the tarballs to the sha512 locations.
We could run it to get all files in their new path with md5, which would remove the warning from the above "fedpkg sources" output even for old uploads, but that's a cosmetic issue, so it's not even entirely required. </optional step>
And then, I have patches ready for fedpkg that I'll submit at that point, which just switch to sha512 and the new 'sources' file format. (the BSD-style one, which contains the hashtype)
We'd push that out, and it would be enough to migrate to sha512:
- fedpkg would upload new source files with sha512, they'd appear only
in their new path
- fedpkg would know what hashtype to use for downloading, based on the
'sources' file, so it would still download just fine old uploads with md5.
<optional step> And if we really want to completely get rid of all md5, then we could run a script on the server-side to get the files in their new path for sha512 as well, even for old uploads.
And then we'd push an update to fedpkg which would ask package maintainers to run a "fedpkg new-sources" on their source files if they are still using md5 hashes, so that all the 'sources' files in distgit will end up pointing to sha512 hashes. </optional step>
All in all, that makes for a pretty smooth migration, with no flag day or outage during which we'd need to cut uploads temporarily.
This is not even close to the only reason for a flag day. we have to have one and we had a plan to roll it all into one, please do not go changing that.
Which is why the upload.cgi script now puts all files in their new path, even for md5 uploads. :)
Dennis
On Fri, Jun 12, 2015 at 09:18:53AM -0500, Dennis Gilmore wrote:
On Thursday, June 11, 2015 04:59:22 PM Mathieu Bridon wrote:
On Wed, 2015-06-10 at 10:33 -0500, Dennis Gilmore wrote:
I think we should be linking to the old location and a sha512sum location not md5 but the general idea is okay
Doing the new location for md5 as well has a pretty big benefit though: it means we can update fedpkg to make it download from the new location independently from the move to sha512.
Decoupling the two makes for an easier migration:
I strongly want a flag day for the migration. because there is other things that need to be coupled with it. such as the changes to ca setup to use our ca for logins but a well known ca for koji.fp.o I want to decouple thekoji hub and web interface also, which will mean we need a new hostname for the hub. all of these changes have a hard requirement of a flag day as end users will need to make changes on the client side.
...
All in all, that makes for a pretty smooth migration, with no flag day or outage during which we'd need to cut uploads temporarily.
This is not even close to the only reason for a flag day. we have to have one and we had a plan to roll it all into one, please do not go changing that.
Reading your reply appears to have quite a negative tone to me, might be because it's the end of the week or because I'm not a native speaker, but I feel some negativity which I do not understand.
I'm not seeing how Mathieu's work allowing to not have a flag day is a problem. We're speaking about one change here and that change does not require a flag day. I mean we should all be happy about this: "cool, less work, more backward compatibility, easier transition". I can only see benefits. If other changes do require a flag day, so be it, but that's then unrelated to the changes made here.
$ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0% Download failed, falling back to the old URL Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
But for a source file uploaded **after** the changes to the upload.cgi scrip, we'd get this:
$ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
Which means at that point, the migration to the new path would be pretty much done.
<optional step> I have a script ready, to be run on the server, which will hardlink all existing uploads from their old to their new path.
this is not optional. we have to link all the tarballs to the sha512 locations.
Could you expand on why this is in fact a required step? fedpkg will be able to handle both locations, right?
Thanks, Pierre
On Fri, 2015-06-12 at 09:18 -0500, Dennis Gilmore wrote:
On Thursday, June 11, 2015 04:59:22 PM Mathieu Bridon wrote:
On Wed, 2015-06-10 at 10:33 -0500, Dennis Gilmore wrote:
I think we should be linking to the old location and a sha512sum location not md5 but the general idea is okay
Doing the new location for md5 as well has a pretty big benefit though: it means we can update fedpkg to make it download from the new location independently from the move to sha512.
Decoupling the two makes for an easier migration:
I strongly want a flag day for the migration. because there is other things that need to be coupled with it. such as the changes to ca setup to use our ca for logins but a well known ca for koji.fp.o I want to decouple thekoji hub and web interface also, which will mean we need a new hostname for the hub. all of these changes have a hard requirement of a flag day as end users will need to make changes on the client side.
Why do you want to couple this change with other, unrelated ones?
We're almost done on this change (changing the download path), and very few more is needed to switch to sha512.
I really see no reason to block all the server-side changes on unrelated changes like the koji CA.
Switching to a well-known CA requires very few code change to fedpkg, so sure, that can be coupled with switching fedpkg to sha512.
Because those are client-side changes, which take time to get into packagers' hands (it needs to be released, built, go to updates -testing, then to updates), and minimizing disruption for packagers is certainly a good idea.
But on the server-side... well everything is **already** ready for the changes of path and hashtype. (except running the script)
- On the server, new uploads are stored in both locations **right
now** (the patches we were discussing in this thread have all been merged and deployed to prod)
- I have the fedpkg patches to change the download location, I just
need to submit them. Also, as I have written it, fedpkg will try the new location first and fallback to the old one if needed.
So as soon as I send those fedpkg patches (still need a bit of time), we can release a new fedpkg with them, and the migration to the new path will effectively be done.
At that point, for an source file which had only be uploaded **before** we changed the upload.cgi script, we'd get this:
$ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0% Download failed, falling back to the old URL Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
But for a source file uploaded **after** the changes to the upload.cgi scrip, we'd get this:
$ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
Which means at that point, the migration to the new path would be pretty much done.
<optional step> I have a script ready, to be run on the server, which will hardlink all existing uploads from their old to their new path.
this is not optional. we have to link all the tarballs to the sha512 locations.
That's not at all what this paragraph was about. I assume you instead wanted to reply to the other optional thing I mention a few paragraphs below.
I'm fine with making it required if you want, even though I don't see why it would matter.
It wouldn't matter because fedpkg will decide which hash to use for **downloads** based on the content of the 'sources' file.
And for already uploaded tarballs, the 'sources' file will tell fedpkg to use md5, not sha512.
As a result, linking all the old tarballs to the sha512 location is not needed at all for the fedpkg users (packagers, koji builds, ...) to continue working as expected.
The only reason it could be needed is if we decided that after a certain point, fedpkg would only ever download with sha512. But that would require updating all the 'sources' files for all branches of all packages, and I remember quite clearly that you told me we wouldn't do that. (in that tiny cubicle we were sharing at the Red Hat office in Brno, around Devconf)
In any case, what I'm saying is that it's not **needed**. But I'm not against doing it anyway if you prefer, as it's somewhat cleaner to have all paths working with sha512, and it wouldn't be disruptive at all anyway. :)
We could run it to get all files in their new path with md5, which would remove the warning from the above "fedpkg sources" output even for old uploads, but that's a cosmetic issue, so it's not even entirely required. </optional step>
And then, I have patches ready for fedpkg that I'll submit at that point, which just switch to sha512 and the new 'sources' file format. (the BSD-style one, which contains the hashtype)
We'd push that out, and it would be enough to migrate to sha512:
- fedpkg would upload new source files with sha512, they'd appear
only in their new path
- fedpkg would know what hashtype to use for downloading, based on
the 'sources' file, so it would still download just fine old uploads with md5.
<optional step> And if we really want to completely get rid of all md5, then we could run a script on the server-side to get the files in their new path for sha512 as well, even for old uploads.
And then we'd push an update to fedpkg which would ask package maintainers to run a "fedpkg new-sources" on their source files if they are still using md5 hashes, so that all the 'sources' files in distgit will end up pointing to sha512 hashes. </optional step>
All in all, that makes for a pretty smooth migration, with no flag day or outage during which we'd need to cut uploads temporarily.
This is not even close to the only reason for a flag day. we have to have one and we had a plan to roll it all into one, please do not go changing that.
So let me clarify a bit.
What I meant is that with the migration plan I detailed above, there wouldn't be a day on which we cut uploads temporarily, do some things, then reopen uploads.
There would certainly be a flag day when we decide to stop accepting md5 uploads, but that needs to be after fedpkg is set to upload with sha512, and that update has reached stable.
On Friday, June 12, 2015 05:02:00 PM Mathieu Bridon wrote:
On Fri, 2015-06-12 at 09:18 -0500, Dennis Gilmore wrote:
On Thursday, June 11, 2015 04:59:22 PM Mathieu Bridon wrote:
On Wed, 2015-06-10 at 10:33 -0500, Dennis Gilmore wrote:
I think we should be linking to the old location and a sha512sum location not md5 but the general idea is okay
Doing the new location for md5 as well has a pretty big benefit though: it means we can update fedpkg to make it download from the new location independently from the move to sha512.
Decoupling the two makes for an easier migration:
I strongly want a flag day for the migration. because there is other things that need to be coupled with it. such as the changes to ca setup to use our ca for logins but a well known ca for koji.fp.o I want to decouple thekoji hub and web interface also, which will mean we need a new hostname for the hub. all of these changes have a hard requirement of a flag day as end users will need to make changes on the client side.
Why do you want to couple this change with other, unrelated ones?
I want to couple the change to sha512 with the others so we have a single path of communication.
We're almost done on this change (changing the download path), and very few more is needed to switch to sha512.
I really see no reason to block all the server-side changes on unrelated changes like the koji CA.
Switching to a well-known CA requires very few code change to fedpkg, so sure, that can be coupled with switching fedpkg to sha512.
This is fine. it requires all users to get new koji configs as the old ones will no longer work. unless we move the urls for koji entirely.
Because those are client-side changes, which take time to get into packagers' hands (it needs to be released, built, go to updates -testing, then to updates), and minimizing disruption for packagers is certainly a good idea.
But on the server-side... well everything is **already** ready for the changes of path and hashtype. (except running the script)
- On the server, new uploads are stored in both locations **right
now** (the patches we were discussing in this thread have all been merged and deployed to prod)
- I have the fedpkg patches to change the download location, I just
need to submit them. Also, as I have written it, fedpkg will try the new location first and fallback to the old one if needed.
So as soon as I send those fedpkg patches (still need a bit of time), we can release a new fedpkg with them, and the migration to the new path will effectively be done.
At that point, for an source file which had only be uploaded **before**
we changed the upload.cgi script, we'd get this: $ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0% Download failed, falling back to the old URL Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
But for a source file uploaded **after** the changes to the upload.cgi
scrip, we'd get this: $ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
Which means at that point, the migration to the new path would be pretty much done.
<optional step> I have a script ready, to be run on the server, which will hardlink all existing uploads from their old to their new path.
this is not optional. we have to link all the tarballs to the sha512 locations.
That's not at all what this paragraph was about. I assume you instead wanted to reply to the other optional thing I mention a few paragraphs below.
I'm fine with making it required if you want, even though I don't see why it would matter.
The reason I want it mandatory is that it would be less confusion for users and potentially less bandwidth if they go to upload a new tarball thats already there, it's just cosmetic.
It wouldn't matter because fedpkg will decide which hash to use for **downloads** based on the content of the 'sources' file.
And for already uploaded tarballs, the 'sources' file will tell fedpkg to use md5, not sha512.
As a result, linking all the old tarballs to the sha512 location is not needed at all for the fedpkg users (packagers, koji builds, ...) to continue working as expected.
The only reason it could be needed is if we decided that after a certain point, fedpkg would only ever download with sha512. But that would require updating all the 'sources' files for all branches of all packages, and I remember quite clearly that you told me we wouldn't do that. (in that tiny cubicle we were sharing at the Red Hat office in Brno, around Devconf)
In any case, what I'm saying is that it's not **needed**. But I'm not against doing it anyway if you prefer, as it's somewhat cleaner to have all paths working with sha512, and it wouldn't be disruptive at all anyway. :)
I think we are on the same page here
We could run it to get all files in their new path with md5, which would remove the warning from the above "fedpkg sources" output even for old uploads, but that's a cosmetic issue, so it's not even entirely required. </optional step>
And then, I have patches ready for fedpkg that I'll submit at that point, which just switch to sha512 and the new 'sources' file format. (the BSD-style one, which contains the hashtype)
We'd push that out, and it would be enough to migrate to sha512:
- fedpkg would upload new source files with sha512, they'd appear
only in their new path
- fedpkg would know what hashtype to use for downloading, based on
the 'sources' file, so it would still download just fine old uploads with md5.
<optional step> And if we really want to completely get rid of all md5, then we could run a script on the server-side to get the files in their new path for sha512 as well, even for old uploads.
And then we'd push an update to fedpkg which would ask package maintainers to run a "fedpkg new-sources" on their source files if they are still using md5 hashes, so that all the 'sources' files in distgit will end up pointing to sha512 hashes. </optional step>
All in all, that makes for a pretty smooth migration, with no flag day or outage during which we'd need to cut uploads temporarily.
This is not even close to the only reason for a flag day. we have to have one and we had a plan to roll it all into one, please do not go changing that.
So let me clarify a bit.
What I meant is that with the migration plan I detailed above, there wouldn't be a day on which we cut uploads temporarily, do some things, then reopen uploads.
There would certainly be a flag day when we decide to stop accepting md5 uploads, but that needs to be after fedpkg is set to upload with sha512, and that update has reached stable.
I do not think it has to be stable. I think updates-testing is fine. some of the other changes I want to enable when we make the change can not be used before the flag day event. i.e. when we push out the koji config changes koji will break. you will only be able to use the config that is right for koji at the given time.
Dennis
On Fri, 2015-06-12 at 10:14 -0500, Dennis Gilmore wrote:
On Friday, June 12, 2015 05:02:00 PM Mathieu Bridon wrote:
On Fri, 2015-06-12 at 09:18 -0500, Dennis Gilmore wrote:
On Thursday, June 11, 2015 04:59:22 PM Mathieu Bridon wrote:
On Wed, 2015-06-10 at 10:33 -0500, Dennis Gilmore wrote:
I think we should be linking to the old location and a sha512sum location not md5 but the general idea is okay
Doing the new location for md5 as well has a pretty big benefit though: it means we can update fedpkg to make it download from the new location independently from the move to sha512.
Decoupling the two makes for an easier migration:
I strongly want a flag day for the migration. because there is other things that need to be coupled with it. such as the changes to ca setup to use our ca for logins but a well known ca for koji.fp.o I want to decouple thekoji hub and web interface also, which will mean we need a new hostname for the hub. all of these changes have a hard requirement of a flag day as end users will need to make changes on the client side.
Why do you want to couple this change with other, unrelated ones?
I want to couple the change to sha512 with the others so we have a single path of communication.
Sure, for the client-side changes it makes sense to bundle them.
Remember that this thread was about the server-side changes of the path/sha512 migration, which are all done now, and needed no outage.
We're almost done on this change (changing the download path), and very few more is needed to switch to sha512.
I really see no reason to block all the server-side changes on unrelated changes like the koji CA.
Switching to a well-known CA requires very few code change to fedpkg, so sure, that can be coupled with switching fedpkg to sha512.
This is fine. it requires all users to get new koji configs as the old ones will no longer work. unless we move the urls for koji entirely.
Yup.
I'm not yet ready to send the changes to fedpkg that actually do the switch anyway, but when I send them, I'll make it clear that they should only be pushed as an update along with the other changes you want.
- On the server, new uploads are stored in both locations
**right now** (the patches we were discussing in this thread have all been merged and deployed to prod)
- I have the fedpkg patches to change the download location, I
just need to submit them. Also, as I have written it, fedpkg will try the new location first and fallback to the old one if needed.
So as soon as I send those fedpkg patches (still need a bit of time), we can release a new fedpkg with them, and the migration to the new path will effectively be done.
At that point, for an source file which had only be uploaded **before**
we changed the upload.cgi script, we'd get this: $ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0% Download failed, falling back to the old URL Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
But for a source file uploaded **after** the changes to the upload.cgi
scrip, we'd get this: $ fedpkg sources Downloading libcangjie-1.3.tar.xz ######################################################### 100.0%
Which means at that point, the migration to the new path would be pretty much done.
<optional step> I have a script ready, to be run on the server, which will hardlink all existing uploads from their old to their new path.
this is not optional. we have to link all the tarballs to the sha512 locations.
That's not at all what this paragraph was about. I assume you instead wanted to reply to the other optional thing I mention a few paragraphs below.
I'm fine with making it required if you want, even though I don't see why it would matter.
The reason I want it mandatory is that it would be less confusion for users and potentially less bandwidth if they go to upload a new tarball thats already there,
Nope.
This is something Toshio requested earlier in this thread, and I implemented it. It's already deployed in production.
I already explained how it works, but basically, if a packager tries to upload a file which only exists at the old path, then it will get hardlinked to the new path.
There will be no actual new upload, no additional bandwidth used.
it's just cosmetic.
Indeed it is.
It wouldn't matter because fedpkg will decide which hash to use for **downloads** based on the content of the 'sources' file.
And for already uploaded tarballs, the 'sources' file will tell fedpkg to use md5, not sha512.
As a result, linking all the old tarballs to the sha512 location is not needed at all for the fedpkg users (packagers, koji builds, ...) to continue working as expected.
The only reason it could be needed is if we decided that after a certain point, fedpkg would only ever download with sha512. But that would require updating all the 'sources' files for all branches of all packages, and I remember quite clearly that you told me we wouldn't do that. (in that tiny cubicle we were sharing at the Red Hat office in Brno, around Devconf)
In any case, what I'm saying is that it's not **needed**. But I'm not against doing it anyway if you prefer, as it's somewhat cleaner to have all paths working with sha512, and it wouldn't be disruptive at all anyway. :)
I think we are on the same page here
Cool, glad it was just a misunderstanding. :)
We could run it to get all files in their new path with md5, which would remove the warning from the above "fedpkg sources" output even for old uploads, but that's a cosmetic issue, so it's not even entirely required. </optional step>
And then, I have patches ready for fedpkg that I'll submit at that point, which just switch to sha512 and the new 'sources' file format. (the BSD-style one, which contains the hashtype)
We'd push that out, and it would be enough to migrate to sha512:
- fedpkg would upload new source files with sha512, they'd
appear only in their new path
- fedpkg would know what hashtype to use for downloading, based
on the 'sources' file, so it would still download just fine old uploads with md5.
<optional step> And if we really want to completely get rid of all md5, then we could run a script on the server-side to get the files in their new path for sha512 as well, even for old uploads.
And then we'd push an update to fedpkg which would ask package maintainers to run a "fedpkg new-sources" on their source files if they are still using md5 hashes, so that all the 'sources' files in distgit will end up pointing to sha512 hashes. </optional step>
All in all, that makes for a pretty smooth migration, with no flag day or outage during which we'd need to cut uploads temporarily.
This is not even close to the only reason for a flag day. we have to have one and we had a plan to roll it all into one, please do not go changing that.
So let me clarify a bit.
What I meant is that with the migration plan I detailed above, there wouldn't be a day on which we cut uploads temporarily, do some things, then reopen uploads.
There would certainly be a flag day when we decide to stop accepting md5 uploads, but that needs to be after fedpkg is set to upload with sha512, and that update has reached stable.
I do not think it has to be stable. I think updates-testing is fine.
Sure, that doesn't change much to my point though. :)
On 05/29/2015 11:32 AM, Mathieu Bridon wrote:
Overall, the whole of the code of this CGI script is hard to read and understand.
What I'm doing in this patch is quite simple, though, and explained in the commit message. (and that's what commit messages are for, after all)
I'd be happy to replace the CGI script altogether by something else (how about replacing distgit by pagure, and just storing the tarballs inside pagure somewhere?), but that's a very different discussion. :)
I've started a discussion about this here: https://github.com/release-engineering/dist-git/issues/1
Regards,
infrastructure@lists.fedoraproject.org