Hi,
Has anyone every used trivy [1] to scan for licenses? It appears more robust and better maintained than askalono-cli and can detect files with multiple licenses and licenses embedded in file headers. I have been running it with "trivy fs --scanners license --license-full ."
[1] https://github.com/aquasecurity/trivy
Hi Maxwell:
On Sun, Mar 3, 2024, Maxwell G wrote:
Has anyone every used trivy [1] to scan for licenses? It appears more robust and better maintained than askalono-cli and can detect files with multiple licenses and licenses embedded in file headers. I have been running it with "trivy fs --scanners license --license-full ."
IMHO trivy is not a robust tool for license detection from me trying it.
It is mostly based on google/licenseclassifier which had a single commit in the last 17 months, and this means this is not more maintained than askalono (and frankly both are fairly lightweight tools for license detection). Trivy adds SPDX expression parsing on top of the google/licenseclassifier and that's it. I would not rely on these for anything serious and certainly not to scan code for license prior to its inclusion in Fedora.
If you want robust license detection, consider using ScanCode [2] and Scancode.io [3] for more complex pipelines. Both are tools that I co-maintain and are considered as better tools for this. Do not hesitate to reach out for help!
Not directly related, I just found out ScanCode has been used for building large code LLMs [4]
[1] https://github.com/google/licenseclassifier [2] https://github.com/nexB/scancode-toolkit [3] https://github.com/nexB/scancode.io [4] https://huggingface.co/papers/2402.19173
-- Cordially Philippe Ombredanne
+1 650 799 0949 | pombredanne@nexB.com AboutCode - Open source for open source - https://www.aboutcode.org VulnerableCode - the open code and open data vulnerability database - https://github.com/nexb/vulnerablecode ScanCode - scan your code, for origin/license/vulnerabilities, report SBOMs - https://github.com/nexB/scancode-toolkit https://github.com/nexB/scancode.io package-url - the mostly universal SBOM identifier for packages - https://github.com/package-url DejaCode - What's in your code?! - http://www.dejacode.com
On Sun Mar 3, 2024 at 20:22 +0100, Philippe Ombredanne wrote:
Hi Maxwell:
Hi Philippe,
On Sun, Mar 3, 2024, Maxwell G wrote:
Has anyone every used trivy [1] to scan for licenses? It appears more robust and better maintained than askalono-cli and can detect files with multiple licenses and licenses embedded in file headers. I have been running it with "trivy fs --scanners license --license-full ."
IMHO trivy is not a robust tool for license detection from me trying it.
I am not necessarily looking for the most robust tool for license detection. I am just looking for a relatively performant and reasonably accurate tool to scan a tree of Go modules for license files for the go-vendor-tools [1] project that I am working on.
I evaluated askalono-cli and trivy for this purpose, and they both fulfil that criteria. I implemented support for both of them and added an option to choose which to use.
The Fedora legal docs describes askalono as:
It is most useful for quick analysis of packages coming out of ecosystems featuring projects known to have (1) highly standardized approaches to layout of license information (it specifically looks only for files that are named LICENSE or COPYING or some obvious variant on those), (2) generally simple license makeup, and (3) cultural preferences for a highly limited set of licenses (for example, Rust crates that don’t bundle legacy C code, Go modules, or Node.js npm packages).
That is exactly what I am using it for. Trivy does a better job at detecting license files paths than asaklono and can also handle files with multiple licenses and some license headers. My code already checks that there is at least one license file in each Go module, so if one is missing, the go-vendor-tools license checker will fail and require the user to take manual action.
The Go ecosystem is relatively standardized in terms of licensing, so I do not feel the need to use a tool like scancode which analyzes every single file and takes a very long time to run.
It is mostly based on google/licenseclassifier which had a single commit in the last 17 months, and this means this is not more maintained than askalono (and frankly both are fairly lightweight
<snip>
I would not rely on these for anything serious and certainly not to scan code for license prior to its inclusion in Fedora. tools for license detection).
I am striving for "reasonably sure" that all license texts are accounted for as opposed to spending 45 minutes performing a detailed license files for each package.
If you want robust license detection, consider using ScanCode [2] and Scancode.io [3] for more complex pipelines. Both are tools that I co-maintain and are considered as better tools for this. Do not hesitate to reach out for help!
I will definitely spend more time playing with scancode-toolkit, but I worry about the amount of time it takes to run on a large go vendor tree and that it has not been packaged for Fedora yet---it has a lot of Python dependencies. I opened [2] to track implementing a scancode backend for go-vendor-tools. I will be sure to let you know if I have any questions!
[1] https://gitlab.com/gotmax23/go-vendor-tools/ [2] https://gitlab.com/gotmax23/go-vendor-tools/-/issues/15
Thanks, Maxwell