On Fri, 1 Sept 2023 at 15:53, Richard Fontana <rfontana(a)redhat.com> wrote:
On Fri, Sep 1, 2023 at 8:09 AM Vít Ondruch <vondruch(a)redhat.com> wrote:
>
> Can we attach percentage to each license? E.g. "Kernel is from 99,9625 %
> GPL-2.0-only license"
>
> I am proposing this mostly as a joke. Nevertheless, it is interesting
> information IMHO.
This sounds similar to an idea I had a few years ago, when I was
somewhat more enthusiastic about general use of ScanCode as a tool. I
think ScanCode can give you results of licenses identified in source
code in a percentage ranking, so my idea was "just list the top few
detected licenses". The idea had the problem of being
scanner-dependent (or else associated with inconsistent approaches
based on the scanner used) and also the mere fact that a license shows
up a lot in detections doesn't necessarily mean it *covers* content in
the package to the extent that would suggest (i.e., some results would
be pretty misleading, seemingly).
It also conflicts with what I think of as a "truth in labeling"
principle which may be something that should guide us here, to some
degree. It is not uncommon for a package to have a small portion of
code covered by a license that is in some sense problematic or
unexpected in a way that is disproportionate to how often it appears.
To use the example of the kernel, there's the presence of Clear BSD
(SPDX: BSD-3-Clause-Clear) on some source files. Arguably there is a
value in exposing that fact, especially for those of us who don't
consider that to be an open source license. But truth in labelling
doesn't mean "list everything in precise detail" necessarily.
Apologies if this has been discussed before, but why not something
like the debian/copyright file? The License tag could be just a list
of all licenses found, without AND or OR, to avoid the combinatorial
issue, and then a copyright file could exactly list what applies to
what.
--
Iñaki Úcar