On Thu, Jan 6, 2011 at 6:10 AM, Tomas Hoger
<thoger@redhat.com> wrote:
Out of curiosity, how should Similarity number be interpreted?
I compare the contents of source packages between distributions. The more similar the source packages the higher the similarity. If the sources are identical, then the similarity is 1, if they are completely unrelated the similarity is 0.
This is how equivalent packages can be determined irrespective of the package names - based on similarity between their sources. A threshold is set to say if two packages are equivalent based on their similarity.
I added some more content yesterdyay in the git repo to show similarity between packages within a distribution. This could be useful to find otherwise duplicated sources - a seperate source package could be created that is made common to the two highly similar packages. I haven't really gone down that track before, so take it for what you may..
--
Tomas Hoger / Red Hat Security Response Team