Hi all,
I have fixed up the script a bit and sprinkled some more processing on
top, so I come back with more precise data (still rpm-specs as of
2022-11-14, as with previous email)
The state is as follows:
* Total of rubygem packages in Fedora:
~~~
$ ls rubygem-*.spec | wc -l
495
~~~
* Total of packages I ran script against (-1 line for the header):
~~~
$ cat rubygems_fedora_spdx_state.csv | wc -l
487
~~~
* Ruby gems where /Fedora License field/ and /gem2rpm/ output matches
and /license-validate/ says they are OK SPDX: *291/495*
https://fedorapeople.org/cgit/jackorp/public_git/spdx_rubygems.git/tree/r...
* Ruby gems where only /license-validate/ says they are OK SPDX, but
licenses may or may not match between Fedora and upstream: *334/495*
https://fedorapeople.org/cgit/jackorp/public_git/spdx_rubygems.git/tree/r...
* Ruby gems where the "or" and "and" are converted to "OR"
and "AND" and
/license-validate/ says they are OK SPDX: *337/495*
~~~
$ cat rubygems_try_convert_conjunctions.csv | grep -E "(true|false);0" |
wc -l
337
~~~
https://fedorapeople.org/cgit/jackorp/public_git/spdx_rubygems.git/tree/r...
I will follow up with RPM specs from this week later.
Regards,
Jarek
On 11/21/22 14:25, Jarek Prokop wrote:
Hi,
I have been working on the validation of Rubygem licenses with the
SPDX format.
all work done so far lives in my fedorapeople space:
https://fedorapeople.org/cgit/jackorp/public_git/spdx_rubygems.git/tree/
It is WIP, including the scripts. (I got a bit sidetracked with
validating MIT variations.)
Note that you have to be in the directory where
`rpm-specs-latest.tar.xz` was upacked into.
From the total of about 500 rubygems (and with my incomplete script)
we have 232 SPDX licenses:
~~~
$ cat current_ok_rb.csv | wc -l
232
~~~
The real count is probably higher, as some gems are simply MIT or have
correct SPDX, however are older so they do not have the license
metadata included that I can compare to.
See this list for complete info:
https://fedorapeople.org/cgit/jackorp/public_git/spdx_rubygems.git/tree/r...
You can see the `current_ok_rb.csv` in the same repository.
So far, I have validated mostly MIT (as those are the easiest to get
right), I will need to fix the script before my next update to account
for other licenses too.
If your gem has multiple valid SPDX license identifiers, but are in
conjunction using "and" "or" instead of "AND"
"OR", the script will be
able to catch that and I'll make a PR to the gem
with the converted form should that be the case.
There are also a few gems excluded from the search as they do not have
a gem in Fedora's sources cache.
These gems are: rubygem-morph-cli rubygem-krb5-auth
rubygem-asciidoctor rubygem-rgen rubygem-net-irc
I have not run the script against them, they are explicitly excluded,
since so far I am only comparing the RPM Spec against gem2rpm output.
== Plan ==
I plan to do this in 2 main phases:
1) Initial pass
This is just a pass through the packages to note all packages that
are already SPDX compatible.
2a) Inspecting and fixing
This probably will be mostly manual work to identify correct
licenses for gems that do not have license in the gemspec metadata.
Also convert Callaway convention to the correct SPDX identifier.
This cannot be reliably automated for example
the BSD license can be converted to 4 different SPDX license
identifiers. However, tooling can give an idea of what the licensing
probably is.
At that point I will be mainly looking on Fedora spec's license,
whether it has valid SPDX or not.
2b) Validate MIT licenses
MIT license has multiple variants. While it mostly seems that the
gem MIT licenses are under what SPDX also considers MIT, I'd like to
validate that assumption.
== How to help ==
Make sure your rubygem package has a license in Fedora's specfile that
is a valid SPDX license identifier.
You can also take a look at the gems where license does not match
between gem2rpm and fedora (RubyGems should use SPDX, even though it
is older version, that's a good pointer for us)
and fix them.
On closing note, be wary of licenses like "LGPL-2.1+", it is valid,
however deprecated by current SPDX version and the `license-validate`
tool won't accept it.
See
https://spdx.org/licenses/ for complete license list including the
list of deprecated identifiers.
Thanks,
Jarek