I think Richard said that he would start a thread like this, but it
hasn't happened, so I feel like should get this off my chest now.
<
https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_l...
starts with this:
| No “effective license” analysis
|
| The License: field is meant to provide a simple enumeration of the
| licenses found in the source code that are reflected in the binary
| package. No further analysis should be done regarding what the
| "effective" license is, such as analysis based on theories of GPL
| interpretation or license compatibility or suppositions that
| “top-level” license files somehow negate different licenses appearing
| on individual source files.
This is contradictory. I think there are two aspects here:
* Determine possible licenses that end up in the binary package.
* Perform algebraic simplifications on the license list.
Both analyses are forms of effective licensing analysis. Of course, you
cannot derive an SPDX identifier without doing any analysis. However, I
strongly believe that the first approach (determining the binary package
license) is itself a form of effective licensing analysis, and similar
reasons for package maintainers not doing this applies. The derived
SPDX identifier will reflect both the package source code and what went
into the build system.
Below, I'm collecting a list of observations of what I believe is the
current approach in this area, as taken by package maintainers carrying
out the SPDX conversion. To me, it strongly suggest that the SPDX
identifiers we derive today do not accurately reflect binary RPM package
licensing, even when lots of package maintainers put in the extra effort
to determine binary package licenses.
* Most package maintainers probably assume that License: tags on all
built RPMs (source RPMs and binary RPMs) should reflect binary package
contents, at least when all subpackages are considered in aggregate.
Often, Source RPMs contain the same License: line as binary RPMs.
* No algebraic simplifications on License: lines are performed.
* All forms of dynamic linking are ignored for License: tags. This
covers ELF (e.g., C, C++), but also Python, Java, and other languages
with late binding.
* C/C++ header file contents is ignored for License: tags, regardless of
header file complexity (e.g., substantial code in templates or inline
functions is not treated specially).
* Statically linked GCC and glibc startup code is ignored and does not
show up in License: lines. The license of glibc startup code isn't
even in SPDX yet, so it's not just Fedora who is ignoring this.
* Statically linked libgcc support code is ignored (e.g., outline
atomics on aarch64, FMV support code on x86-64). This code comes with
the compiler, but is compiled from C sources that ship with the
compiler. These items overlap with the startup code, but licensing
could theoretically be different.
* Some shared objects come with statically linked support code. I doubt
that many package maintainers are aware of that, so they effectively
ignore the licensing impact of that. It's structurally similar to
inline functions and templates in header files.
* Output from source code generations such as autoconf, bison and flex
is often (but not always) ignored, in some cases even if the generated
code ships in the source RPM and is compiled as-is, without
regeneration. (autoconf can generate more than just build scripts.)
* Licenses of crate build-dependencies end up in License: tags of RPM
packages. This is a form of static linking analysis for which we have
tooling, and it is mandated by the guidelines. It only covers the
Rust part, other gaps for filling out License: are still there. (I
don't know if the generated License: tags are accurate for individual
subpackages; it seems unlikely.) Go might have something similar.
* Sometimes we ignore upstream SPDX identifiers if we believe them to be
incorrect, but that approach is not consistent, as far as I know.
* Apparently, there seems to be some confusion whether AND or OR is the
right separator for SPDX tags in License: lines.
* Some package maintainers, when translating to SPDX, merely translate
the existing License: line as best as they can, without looking at the
actual sources or produced binaries.
I looked around a bit and there are no documented product requirements
internally, so I don't think we can justify investing in tooling or
training to improve data quality. (I'll keep digging, though.)
In the light of this, I would like to suggest updating the guidelines in
the following way:
The License: line should be based on the sources only. Using a tool
such as Fossology to discover relevant licenses and their SPDX tags is
sufficient. No analysis how licenses from package source code or the
build environment propagate into binary RPMs should be performed.
Individual SPDX identifiers that a tool has listed should be separated
by AND. Package maintainers are encouraged to re-run license analysis
tooling on the source code as part of major package rebases, and
update the License: tag accordingly.
To me, that seems to be much more manageable.
Thoughts?
Thanks,
Florian