Dear all,
Recently I have been looking into our lookaside tree. The tree is built as such:
package_name/tarball_name.tar.gz/md5sum_tarball/tarball_file.tar.gz
I ran into such situation:
http://pkgs.fedoraproject.org/lookaside/pkgs/389-admin/389-admin-1.1.12.t...
where as you can see a single tarball as 3 different md5 sums.
So I got curious as to how often do we encounter this situation. I got the
output of the tree command on the lookaside cache and looked into it.
The summary data is:
173083 sources checked
2672 packages skipped because retired
5568 packages had multiple md5 for at least 1 of their version
That's about 3.22% (of the total)
Speaking about it at the infrastructure meeting yesterday we discussed that
there might multiples causes for this:
- bad upload, something just went wrong when uploading
- mistake upstream, upstream releasing another tarball with the same name
- mistake from the packager, where the packager re-generated or changed the
tarball and upload it.
So this morning I look at the size of the tarball uploaded. I think some case
are clear example of bad upload:
Curious package 389-ds-base sources: 389-ds-base-1.2.11.9.tar.bz2
4059d198768f9f8dc9372dc1c54bc3c3: 14
a84966b47369a8603da117f7b7fd923c: 2958896
While don't quite look like bad uploads:
Curious package 0ad sources: 0ad-0.0.12-alpha-unix-build.tar.xz
0ff92fb2b22b5384067cdd88b89e5450: 8693880
79297345368c09ae55fdb0dba69763ce: 8545452
So I am attaching to this email the full output with the sizes, feel free to
have a look at it and please, try to avoid uploading modified tarball with
the same name or try to work with upstream to avoid these situations.
Just a side note: this has no impact on our build system as it uses the
`sources` file present in the git repo which contains the md5 sum to use, so it
will use the right one :)
Hope this helps,
Pierre