On Thu, Jan 6, 2011 at 6:10 AM, Tomas Hoger <thoger@redhat.com> wrote:

> Another report I made which may or may not be useful to the security
> team is a list of packages between Debian and Fedora that are roughly
> equivalent, irrespective of what the package names are
> https://github.com/silviocesare/Equivalent-Packages/blob/master/NearestNeighbour/Debian5_Fedora13_Matches

Out of curiosity, how should Similarity number be interpreted?


I compare the contents of source packages between distributions. The more similar the source packages the higher the similarity. If the sources are identical, then the similarity is 1, if they are completely unrelated the similarity is 0.
 
This is how equivalent packages can be determined irrespective of the package names - based on similarity between their sources. A threshold is set to say if two packages are equivalent based on their similarity.

I added some more content yesterdyay in the git repo to show similarity between packages within a distribution. This could be useful to find otherwise duplicated sources - a seperate source package could be created that is made common to the two highly similar packages. I haven't really gone down that track before, so take it for what you may..

--
Tomas Hoger / Red Hat Security Response Team

--
Silvio