One of the various reasons for having package reviews is having a human verify that the packager's choice of License: tag is valid. The Packaging Committee is was faced with a request (https://pagure.io/packaging-committee/issue/1007) that has us questioning just how much license review is required.
Are any of the following acceptable?
1) Trust the packager to do a license review, with no reviewer verification.
2) Trust the output of an automated tool which attempts to detect project licenses (such as askalono).
3) Trust the license tag from a project hosting service such as github? (I understand that the answer may depend on the hosting service.)
Depending on what is acceptable, we may be able to reduce bureaucracy a bit. I know that back when I did package reviews, the license review was often the most difficult part.
- J<
On Fri, 24 Jul 2020, Jason Tibbitts wrote:
Are any of the following acceptable?
- Trust the packager to do a license review, with no reviewer
verification.
Definitely need a second opinion IMHO (IANAL).
- Trust the output of an automated tool which attempts to detect
project licenses (such as askalono).
My understanding is that such tools are pretty accurate when a license is positively identified, and this can be a reasonable 2nd opinion. When the tool fails to find or confirm a license, then manual search may be required.
- Trust the license tag from a project hosting service such as github?
(I understand that the answer may depend on the hosting service.)
Ask a real lawyer. I would be inclined to not trust the service, but it might count as "due diligence".
On Friday, 24 July 2020 14:40:15 CEST Stuart D Gathman wrote:
On Fri, 24 Jul 2020, Jason Tibbitts wrote:
Are any of the following acceptable?
- Trust the packager to do a license review, with no reviewer
verification.
Definitely need a second opinion IMHO (IANAL).
- Trust the output of an automated tool which attempts to detect
project licenses (such as askalono).
My understanding is that such tools are pretty accurate when a license is positively identified, and this can be a reasonable 2nd opinion. When the tool fails to find or confirm a license, then manual search may be required.
- Trust the license tag from a project hosting service such as github?
(I understand that the answer may depend on the hosting service.)
Ask a real lawyer. I would be inclined to not trust the service, but it might count as "due diligence".
I want to precise that the tool used (askalono) does not work with Github "license field" but works by analysing all the files and look for licence texts and SPDX tag.
"SDG" == Stuart D Gathman stuart@gathman.org writes:
SDG> Ask a real lawyer.
For those in the Fedora community, the legal mailing list is as close as we can get, since at one real lawyer (whose opinions count for Fedora) occasionally responds to this list and the Fedora community liaison with the legal team can be assumed to read it. Though at the moment I do not know who that person is.
So basically this list is, as far as we know, the only way for the packaging committee to obtain a legal opinion.
- J<
On Fri, Jul 24, 2020 at 08:40:15AM -0400, Stuart D Gathman wrote:
On Fri, 24 Jul 2020, Jason Tibbitts wrote:
Are any of the following acceptable?
- Trust the packager to do a license review, with no reviewer
verification.
Definitely need a second opinion IMHO (IANAL).
- Trust the output of an automated tool which attempts to detect
project licenses (such as askalono).
My understanding is that such tools are pretty accurate when a license is positively identified, and this can be a reasonable 2nd opinion. When the tool fails to find or confirm a license, then manual search may be required.
The package reviewer and the person submitting the new package should be taking the time to do this part. It's tedious, but the advantage is that we can then trust our "normalized" license string that goes in the spec file as capturing the licenses that apply to that particular project.
One that I have used for package reviews is:
https://github.com/nexB/scancode-toolkit
- Trust the license tag from a project hosting service such as github?
(I understand that the answer may depend on the hosting service.)
Ask a real lawyer. I would be inclined to not trust the service, but it might count as "due diligence".
On Friday, 24 July 2020 09:16:19 CEST Jason Tibbitts wrote:
One of the various reasons for having package reviews is having a human verify that the packager's choice of License: tag is valid. The Packaging Committee is was faced with a request (https://pagure.io/packaging-committee/issue/1007) that has us questioning just how much license review is required.
Are any of the following acceptable?
Trust the packager to do a license review, with no reviewer verification.
Trust the output of an automated tool which attempts to detect project licenses (such as askalono).
Trust the license tag from a project hosting service such as github? (I understand that the answer may depend on the hosting service.)
Depending on what is acceptable, we may be able to reduce bureaucracy a bit. I know that back when I did package reviews, the license review was often the most difficult part.
- J<
Hi again,
Can I get a definitive opinion from legal on this? So far I have 64 new dependencies ready to be included in Fedora for over 1,000 packages I have so far checked (650 remaining). The packages have both been autodetected by askalono and manually checked the LICENSE files. I'd like to be able to package them before F33 branching hopefully.
Thanks,
Robert-André
On Fri, Jul 24, 2020 at 3:16 AM Jason Tibbitts tibbs@math.uh.edu wrote:
One of the various reasons for having package reviews is having a human verify that the packager's choice of License: tag is valid. The Packaging Committee is was faced with a request (https://pagure.io/packaging-committee/issue/1007) that has us questioning just how much license review is required.
Are any of the following acceptable?
- Trust the packager to do a license review, with no reviewer verification.
It seems like two different things may be getting conflated here:
1. Review of a package to determine whether it satisfies Fedora licensing policy. 2. Choice of what to put for the License: tag -- assuming 1 has been done.
I don't have a view on whether the existing approach of having a human reviewer is absolutely needed. However, a general point I'd make is that Red Hat has been assuming a certain level of very high quality in community legal review of new Fedora packages -- this assumption is baked into certain internal processes we have at Red Hat for RHEL -- and the additional human review probably contributes to this. If we relax some elements of Fedora legal review perhaps we'll need to introduce others to compensate for this, whether on the Fedora side or the RHEL side, or both.
On the other hand, that observation doesn't really apply to the License: tags themselves. I would say we (or I, anyway) don't really find the Fedora License: tags that helpful to begin with because they are generally not super-accurate (by my own standards, at least) and there seems to be substantial inconsistency in how they are selected across different packages and reviewers/maintainers.
- Trust the output of an automated tool which attempts to detect project licenses (such as askalono).
In general, license scanning tools are pretty bad, probably unavoidably so, since there are limits to how much you can automate license detection. The best or least bad one, and the only one I would personally vouch for, is ScanCode (mentioned by David Cantrell in his response). I would encourage Fedora to consider some sort of formal expectation for the use of ScanCode to aid in one or both of the distinct tasks noted above (review for conformance to Fedora licensing policy, decision for choice of License: tag). But even with a high quality tool like ScanCode you can't "trust" its output for purposes of task 1 (whereas its output might be okay enough to be used in a fairly mechanical way for task 2).
- Trust the license tag from a project hosting service such as github? (I understand that the answer may depend on the hosting service.)
This is just another license scanning tool. There's no particular reason to "trust" it any more than use of non-hosted tools. My impression in the past was that the GitHub license identification was based on the licensee tool which seemed to be pretty primitive and naive in its assumptions. Maybe that's good enough for task 2, but not for task 1, IMO.
Incidentally I also would encourage Fedora to look into the potential for collaboration with the ClearlyDefined project (https://clearlydefined.io/) which is currently not oriented towards Linux distributions or RPM-based packages at all. I could imagine a future where ClearlyDefined could be helpful in both tasks 1 and 2 identified above.
Richard
On Fri, Jul 31, 2020 at 1:59 PM Richard Fontana rfontana@redhat.com wrote:
On Fri, Jul 24, 2020 at 3:16 AM Jason Tibbitts tibbs@math.uh.edu wrote:
One of the various reasons for having package reviews is having a human verify that the packager's choice of License: tag is valid. The Packaging Committee is was faced with a request (https://pagure.io/packaging-committee/issue/1007) that has us questioning just how much license review is required.
Are any of the following acceptable?
- Trust the packager to do a license review, with no reviewer verification.
It seems like two different things may be getting conflated here:
- Review of a package to determine whether it satisfies Fedora
licensing policy. 2. Choice of what to put for the License: tag -- assuming 1 has been done.
I don't have a view on whether the existing approach of having a human reviewer is absolutely needed. However, a general point I'd make is that Red Hat has been assuming a certain level of very high quality in community legal review of new Fedora packages -- this assumption is baked into certain internal processes we have at Red Hat for RHEL -- and the additional human review probably contributes to this. If we relax some elements of Fedora legal review perhaps we'll need to introduce others to compensate for this, whether on the Fedora side or the RHEL side, or both.
On the other hand, that observation doesn't really apply to the License: tags themselves. I would say we (or I, anyway) don't really find the Fedora License: tags that helpful to begin with because they are generally not super-accurate (by my own standards, at least) and there seems to be substantial inconsistency in how they are selected across different packages and reviewers/maintainers.
- Trust the output of an automated tool which attempts to detect project licenses (such as askalono).
In general, license scanning tools are pretty bad, probably unavoidably so, since there are limits to how much you can automate license detection. The best or least bad one, and the only one I would personally vouch for, is ScanCode (mentioned by David Cantrell in his response). I would encourage Fedora to consider some sort of formal expectation for the use of ScanCode to aid in one or both of the distinct tasks noted above (review for conformance to Fedora licensing policy, decision for choice of License: tag). But even with a high quality tool like ScanCode you can't "trust" its output for purposes of task 1 (whereas its output might be okay enough to be used in a fairly mechanical way for task 2).
- Trust the license tag from a project hosting service such as github? (I understand that the answer may depend on the hosting service.)
This is just another license scanning tool. There's no particular reason to "trust" it any more than use of non-hosted tools. My impression in the past was that the GitHub license identification was based on the licensee tool which seemed to be pretty primitive and naive in its assumptions. Maybe that's good enough for task 2, but not for task 1, IMO.
Incidentally I also would encourage Fedora to look into the potential for collaboration with the ClearlyDefined project (https://clearlydefined.io/) which is currently not oriented towards Linux distributions or RPM-based packages at all. I could imagine a future where ClearlyDefined could be helpful in both tasks 1 and 2 identified above.
One thing I've been slowly working on getting packaged up and usable in Fedora is openSUSE's Cavil tool[1], which does not make a judgement on licenses, per se, but does a deep scan similar to our licensecheck tool but presents the output in a much more meaningfully understandable form. There's also some pattern matching and confidence interval stuff which can help in determining what the effective license is for the project, which is something that human reviewers tend to struggle with.
The problem with most license scanning tools is that they try to "judge" the result based on a very limited set of heuristics, mostly based on license file detection. As we know, this is insufficient for figuring out the true nature of a project's licensing, and this is one thing Cavil is better at handling.
Perhaps this could help with making package reviews easier to go through for legal review stuff.
[1]: https://github.com/openSUSE/cavil