Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 30 15:56
    pombredanne synchronize #2979
  • Sep 30 15:53
    pombredanne synchronize #2979
  • Sep 30 15:46
    pombredanne synchronize #3104
  • Sep 30 15:46

    pombredanne on prep-release-31-2

    Add new convenience tool to deb… Add missing ignorables to recen… Add new and improved license de… and 2 more (compare)

  • Sep 30 10:49
    pombredanne synchronize #2979
  • Sep 30 10:45
    pombredanne synchronize #2979
  • Sep 30 09:05

    pombredanne on deprecated-license-texts

    (compare)

  • Sep 30 09:05
    pombredanne closed #3101
  • Sep 30 09:05

    pombredanne on develop

    Restore license texts of deprec… Merge pull request #3101 from n… (compare)

  • Sep 30 08:37
    AyanSinhaMahapatra synchronize #2961
  • Sep 30 08:37

    AyanSinhaMahapatra on add-license-detection

    (compare)

  • Sep 29 21:53
    AyanSinhaMahapatra synchronize #2961
  • Sep 29 21:53

    AyanSinhaMahapatra on add-license-detection

    Tag license intro rules correct… Update false positives and unkn… (compare)

  • Sep 29 16:24
    DennisClark labeled #3118
  • Sep 29 16:24
    DennisClark assigned #3118
  • Sep 29 16:24
    DennisClark opened #3118
  • Sep 29 16:24
    DennisClark assigned #3118
  • Sep 29 13:51
    BluBloos starred nexB/scancode-toolkit
  • Sep 29 04:15
  • Sep 29 01:37
    eiyiaioyou starred nexB/scancode-toolkit
Philippe Ombredanne
@pombredanne
@guddutopper nope, this is more to get a quicker set of results.
I am worried otherwise as to why a scan of 120 jars would take so long
ak-iitb
@ak-iitb
@pombredanne using --package helped. Could you also tell me a way to removing the file information table from the report. I only want the licenses and copyright info but file info is automatically getting reported. I have only used the -pcl option.
Philippe Ombredanne
@pombredanne
@guddutopper which report?
Roshan Thomas
@Thomshan
I believe @guddutopper is referring to the HTML report. This was a problem for me as well since the file information table in the HTML report is massive in medium-large sized projects. MS Edge would constantly crash for me because there was just so much to render. I finally created a script to remove all lines pertaining to the file information table in the html file as a workaround.
Philippe Ombredanne
@pombredanne
@Thomshan @guddutopper ok, the HTML report is always going to be limited. The volume of data really means the JSON is more appropriate as an input to the workbench.
alternatively we could get a plugin to help filter things out. In all cases, having an issue is much welcomed :P
Roshan Thomas
@Thomshan
I have a question with regard to scanning ".class" files. If I'm not wrong, ".class" files do not retain any comments in them (so there won't be any license text). I used scancode to run a scan on a jar (jaxen-1.1.3.jar) and scancode reported the presence of a proprietary license in one of the .class files of this jar (jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class). I used an IDE to investigate and couldn't find anything at the line number reported by scancode. Any idea how/why this could have happened?
Philippe Ombredanne
@pombredanne

@Thomshan try this:
scancode --license --license-text --license-text-diagnostics --json-pp jaxen.json jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class

the results:

          "start_line": 134,
          "end_line": 134,
          "matched_rule": {
            "identifier": "proprietary-license_276.RULE",
            "license_expression": "proprietary-license",
            "licenses": [
              "proprietary-license"
            ],
....
            "matcher": "2-aho",
            "rule_length": 4,
            "matched_length": 4,
            "match_coverage": 100.0,
            "rule_relevance": 100
          },
          "matched_text": "may not be modified"

and ...

$ strings jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class | grep -A2 -B2 "may not be modified"
org/jaxen/dom/NamespaceNode
org/w3c/dom/DOMException
"Namespace node may not be modified
org.w3c.dom.Node
java/lang/Class

The thing is that 1. class files can contain copyright and licenses in literals and texts. 2. scancode does collect these strings in binaries

the fix is going to add a new RULE with this .yml is_false_positive: yes some notes: not a license reference, seen in Jaxen and this text in .RULE : node may not be modified
note that on https://repo1.maven.org/maven2/jaxen/jaxen/1.1.3/jaxen-1.1.3-sources.jar $ scancode --license --license-text --license-text-diagnostics --json-pp - jaxen-1.1.3-sources.jar-extract/org/jaxen/dom/NamespaceNode.java will also match the same text
Philippe Ombredanne
@pombredanne
using --license-text to get the match text lines (and additionally and optionally --license-text-diagnostics to only get the strictly matched words) is useful to find possible errors
Also proprietary-license_276.RULE relevance should NOT be 100 but rather 70 ... as this is not a super conclusive short rule, as witnessed by your problem.
Do you mind to draft a ticket with all these details?
Roshan Thomas
@Thomshan
Got it. Sure, I'll draft a ticket. Thank you.
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi, I am running trying to run the version scancode-toolkit 21.6.7 on docker but it seems to fail each time with the below error:

ERROR: Cannot install scancode-toolkit==21.6.7 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

The conflict is caused by:
scancode-toolkit 21.6.7 depends on pygments
The user requested (constraint) pygments

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

The command '/bin/sh -c ./scancode --help' returned a non-zero code: 1

is there something i can do to resolve this issue?
Philippe Ombredanne
@pombredanne
hum
@balakrishna-mukundaraj do you mind to enter an issue? that's a bug for sure
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi @pombredanne , please find the bug details below
Docker fails to run on scancode-toolkit 21.6.7 #2554
Philippe Ombredanne
@pombredanne
@balakrishna-mukundaraj thanks! I am looking into this
balakrishna-mukundaraj
@balakrishna-mukundaraj

Hi @pombredanne

while running a scan a source we had an EPL license match which was :

/*
  • This program and the accompanying materials are made available under the
  • terms of the Eclipse Public License 2.0 which is available at
  • http://www.eclipse.org/legal/epl-2.0
    *
  • SPDX-License-Identifier: EPL-2.0
    */
even though the block clearly says that it is a EPL license, the result from the scancode said epl-2.0 OR apache-2.0 (matched from the rule epl-2.0_or_apache-2.0_2.RULE). Is there any way to fix this issue? Since there is no Apache license found in the entire file.
Also matches to epl-2.0_or_apache-2.0_or_gpl-2.0_with_openjdk-exception.RULE in some cases when there is only epl license found.
Philippe Ombredanne
@pombredanne
@balakrishna-mukundaraj that's a bug :) do you mind to enter an issue? this is fairly esy to fix
Sarita Singh
@itssingh
@pombredanne How can I get complete license text of a license detected in a code file?
3 replies
Henrik Sandklef
@hesa

Scancode (thanks for developing it) generates SDPX version 2.1 (--spdx-tv or --spdx-rdf) and has spdx-tool 0.6.1 as requirements (listed in requirements.txt)

  • any plans moving to SPDX 2.2?
  • using examples/parse_rdf.py from tools-python 0.6.1 on an RDF generated by Scancode I get som errors (see below). WHat am I doing wrong?

Errors:
SPDXID must be "SPDXRef-[idstring]" where [idstring] is a unique string containing letters, numbers, ".", "-".
More than one File checksum defined.
More than one file copyright text defined.
Errors while parsing

Philippe Ombredanne
@pombredanne
Dear @hesa (Thanks for stopping by!) ... SPDX 2.2 support is in the works at https://github.com/spdx/tools-python which I also maintain ... there are quite a few WIP bits that I merged and I am about to make a release soon enough :)
note that if you want to chip in with a helping hand, you will never be turned aways :D
Henrik Sandklef
@hesa
I'd love to join. Currently can't though. Donyou have any dev guide?
@pombredanne , do you have any ideas what I am doing wrong when parsing as reading a scancode produced spdx report, as described above?
Philippe Ombredanne
@pombredanne
@hesa no immediate idea... do you mind to file an issue with a small example?
Henrik Sandklef
@hesa
@pombredanne on my way, thanks
@pombredanne would you like it in scancode-toolkit or tools-python?
Philippe Ombredanne
@pombredanne
that's for scancode-toolkit IMHO
Henrik Sandklef
@hesa
@pombredanne ..... double checked the issue, hope it is useful. Don't hesitate to use me as a test/review resource here
Philippe Ombredanne
@pombredanne
@hesa :+1:
Sougata das
@rijusougata13
Hi, is there any way to test if my installations is successful or not ? I have git cloned , run ./configure.bat and run./Scripts/activate.bat and there were no error !
Philippe Ombredanne
@pombredanne
@rijusougata13 then run "scancode" proper to run a scan. For instance: scancode -clipeu --json-pp - samples -n4
Henrik Sandklef
@hesa
Sorry to bug you all on this list. Have a question about Nexb's license-expression - where do I ask this?
Philippe Ombredanne
@pombredanne
@hesa you do not bug anyone. You can ask here alright :P
Henrik Sandklef
@hesa
Excellent :)
I wrote some words at the end of this issue over at maxhbr/LDBcollector#4
In short, how do I add a "translation" of a license (e.g. GPLv2 to GPL-2.0-only)? Do you have a procedure?
I have some 10-20 license expressions from Yocto that makes https://github.com/vinland-technology/flict scream and shout
I'd rather add the translation to license-expression than to flict
Philippe Ombredanne
@pombredanne
me thinks....
@hesa what you call a translation is a license detection (or you could call it a normalization).
This would typically a job for scancode-toolkit
But you can also use the license expression library for more constrained approach
The license expression parsing operates on license "symbols" each consisting of a key (say the SPDX id or the scancode key) and one or more "aliases" that can be arbitrary strings.
Philippe Ombredanne
@pombredanne
The translation would be to use GPLv2 as an alias