On 8/31/23 2:39 AM, Daniel P. Berrangé wrote:
On Thu, Aug 24, 2023 at 02:52:21PM -0400, Richard Fontana wrote:
> Some of the complaints that have surfaced since the migration from the
> Callaway system to SPDX seem to be, at root, an aesthetic distaste for
> complex license expressions in RPM license metadata. This may explain
> why some favor application of "effective license" analysis. I suspect
> there is also a sort of psychological desire to hide the underlying
> licensing complexity that characterizes many packages.
Lets take the proposed change to the kernel spec:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2648/diffs#b49...
as an example of "complex license expressions" for which
there is likely an aesthetic distaste. Each distinct
SPDX-License-Identifier tag expession, is combined such
that we end up with:
License: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-2-Clause) AND ((GPL-2.0-only WITH
Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR
CDDL-1.0) AND ((GPL-2.0-only WITH Linux-syscall-note) OR Linux-OpenIB) AND ((GPL-2.0-only
WITH Linux-syscall-note) OR MIT) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR
BSD-3-Clause) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR MIT) AND BSD-2-Clause AND
BSD-3-Clause AND BSD-3-Clause-Clear AND GPL-1.0-or-later AND (GPL-1.0-or-later OR
BSD-3-Clause) AND (GPL-1.0-or-later WITH Linux-syscall-note) AND GPL-2.0-only AND
(GPL-2.0-only OR Apache-2.0) AND (GPL-2.0-only OR BSD-2-Clause) AND (GPL-2.0-only OR
BSD-3-Clause) AND (GPL-2.0-only OR CDDL-1.0) AND (GPL-2.0-only OR Linux-OpenIB) AND
(GPL-2.0-only OR MIT) AND (GPL-2.0-only OR X11) AND (GPL-2.0-only WITH Linux-syscall-note)
AND GPL-2.0-or-later AND (GPL-2.0-or-later OR BSD-2-Clause) AND (GPL-2.0-or-later OR
BSD-3-Clause) AND (GPL-2.0-or-later OR MIT) AND (GPL-2.0-or-later WITH GCC-exception-2.0)
AND (GPL-2.0-or-later WITH Linux-syscall-note) AND ISC AND LGPL-2.0-or-later AND
(LGPL-2.0-or-later OR BSD-2-Clause) AND (LGPL-2.0-or-later WITH Linux-syscall-note) AND
LGPL-2.1-only AND (LGPL-2.1-only OR BSD-2-Clause) AND (LGPL-2.1-only WITH
Linux-syscall-note) AND LGPL-2.1-or-later AND (LGPL-2.1-or-later WITH Linux-syscall-note)
AND (Linux-OpenIB OR GPL-2.0-only) AND (Linux-OpenIB OR GPL-2.0-only OR BSD-2-Clause) AND
MIT AND (MIT OR Apache-2.0) AND (MIT OR GPL-2.0-only) AND (MIT OR GPL-2.0-or-later) AND
(MIT OR LGPL-2.1-only) AND (MPL-1.1 OR GPL-2.0-only) AND (X11 OR GPL-2.0-only) AND (X11 OR
GPL-2.0-or-later) AND Zlib AND (copyleft-next-0.3.1 OR GPL-2.0-or-later) AND
(Redistributable, no modification permitted)
Given that the kernel is a very large package with many files and it has
adopted SPDX ids at the file level (which means the licensing info is
far more complete and easier to parse :) - there is nothing surprising
to me about the length of this string. It is what it is!
While the majority of files in the kernel are
"GPL-2.0-only",
a number of files are offered under a choice of licenses (OR).
Even if 99% of files were simply GPL-2.0-only, it only takes
a handful of files being offered under a choice, to result in
an enourmous SPDX expression like the one above. In the above
example, at a bare minimum it would only take 30 files, out
of the kernel's 80,000 to have distinct licence choices to
cause the existance the above expression.
That's an interesting point, but I'm not sure how we could justify some
kind of an exception in such a case
While this is an accurate reflection of the range of distinct
file license choices, I'm not convinced that this approach is
especially beneficial to Fedora users.
well, it's not really just about Fedora users - besides the benefit
downstream, I think there is some benefit to what Fedora is doing in a
broader, example-setting, ecosystem sense. I guess part of this feeling
comes from my thinking that any desire or attempt to obscure the license
complexity is not a good thing and potentially creates more work or
issues - reflecting the reality, to me, sets a good precedent
What purpose does it serve to list "MPL-1.1 OR GPL-2.0-only"
and "MIT OR LGPL-2.1-only", etc if only perhaps < 1% of files
carry this choice and we're not telling the user which 1% of
files it applies to ?
they can run a license scanner and create an SPDX document that shows
the file level license info to determine this. And that report will be
far more complex and lengthy than what you came up with above ;)
In that way, what you have above is a useful "summary" and accurate
reflection of the big picture
The previous effective license analysis addressed this problem,
such that everything reduced down to "GPLv2 and Redistributable"
I don't want to suggest going back to effective analysis as I
think that was overly simplified, but perhaps we can finese
what we're doing today.
ie tather than trying to maintain the full list of choices, can
we eliminate all the OR clauses, such that we present just a
flat list of each distinct SPDX license name that is found.
IOW, the above kernel SPDX expression would be
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND
CDDL-1.0 AND copyleft-next-0.3.1 AND GPL-1.0-or-later AND
GPL-1.0-or-later-WITH-Linux-syscall-note AND GPL-2.0-only AND
GPL-2.0-only-WITH-Linux-syscall-note AND GPL-2.0-or-later AND
GPL-2.0-or-later-WITH-GCC-exception-2.0 AND GPL-2.0-or-later-WITH-Linux-syscall-note AND
ISC AND LGPL-2.0-or-later AND LGPL-2.0-or-later-WITH-Linux-syscall-note AND LGPL-2.1-only
AND LGPL-2.1-only-WITH-Linux-syscall-note AND LGPL-2.1-or-later AND
LGPL-2.1-or-later-WITH-Linux-syscall-note AND Linux-OpenIB AND MIT AND MPL-1.1 AND
Redistributable, no modification permitted AND X11 AND Zlib
but then this would be an exception to our original policy? and how
would we articulate that? I'm not sure why this is really any "better"
than your original - it's just shorter and truncated.
oh, and we should take a look at the "Redistributable, no modification
permitted" ones... that is likely the firmware licenses that were never
captured
> I do think that the current approach can be criticized as being
overly
> pedantic, and perhaps also internally contradictory (some of Florian's
> recent comments get at the various ways in which we are being
> contradictory). We have a still-undocumented rule that what I call
> "true public domain" should not be reflected in the License: field
> (unless it would otherwise be empty), yet we have carefully attempted
> to collect nonstandard public domain dedication statements and cover
> those by `LicenseRef-Fedora-Public-Domain`. We have been using a
> similar approach with `LicenseRef-Fedora-UltraPermissive`. These
> basically replace Callaway system names "Public domain" (though this
> was sometimes used for "true public domain") and "Freely
> redistributable without restrictions", respectively.
>
> I think it can reasonably be argued that there is little point in
> including `LicenseRef-Fedora-Public-Domain` and
> `LicenseRef-Fedora-UltraPermissive` in the License: field since they
> are associated with no conditions or obligations. In those special
> cases where the License: field would otherwise be empty, we can ask
> SPDX to create unique identifiers for the license text in question.
I think there is value in LicenseRef-Fedora-Public-Domain, etc
because it expresses the fact that license analysis has actually
been performed and these public domain choices have been correctly
identified. I don't like the need to special case the omission
to avoid an entirely empty License: field. If we have a need to
record LicenseRef-Fedora-Public-Domain in any scenario, we should
be consistent.
eg consider a package is 100% public domain initially so we
have to record that to avoid empty field:
License: LicenseRef-Fedora-Public-Domain
then one day a file is added which is MIT. I would find it
pretty strange for the rule to say we can now drop the
LicenseRef-Fedora-Public-Domain to go to just record:
License: MIT
when 99% of the files are still LicenseRef-Fedora-Public-Domain
and only 1 single file were MIT.
IMHO the package should be changed to say
License: LicenseRef-Fedora-Public-Domain and MIT
IOW, I think we should always be recording the license, even if
it is a public domain LicenseRef term.
100% agree
> We might want to extend this principle to other things, such as GPL
> exceptions that entail no conditions in the use case encountered in
> particular packages. (There is already an old issue about this, I
> think concerning the Bison exception.)
Personally I like the way we're not recording the existance of each
license and exception, just not the creation of the combinatorial
expansion of each license choice.
> This wouldn't do *that* much to make License: fields simpler, so maybe
> it's not particularly worthwhile. There is also the problem that if we
> make it optional, package maintainers may be less likely to scrutinize
> things that are assumed to fall into these kinds of categories, when
> in some cases they actually wouldn't, although I think it's now clear
> that those situations are uncommon. In theory we'd still expect
> package maintainers to submit issues to have things that seem to
> qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be
> challenging to enforce that expectation and the Fedora Legal team
> would have to end up doing all that work themselves, which might be a
> justifiable result.
>
> As with abandoning the "license of the binary" rule, this would
> seemingly be a major departure from the principles established under
> the Callaway system.
>
> Any thoughts on this?
With regards,
Daniel