Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 15:11
    pombredanne commented #2427
  • 13:14
    pombredanne commented #3144
  • 09:51
    NaveenTSW closed #3161
  • 07:50
    sschuberth commented #3006
  • 07:49
    sschuberth commented #2427
  • 07:48
    sschuberth commented #2761
  • 07:27
    sschuberth commented #3144
  • Dec 08 15:39
    meretp synchronize #3173
  • Dec 08 13:23
    meretp edited #3173
  • Dec 08 13:03
    danielthieleog commented #3171
  • Dec 08 12:43
    meretp opened #3173
  • Dec 08 12:40
    meretp labeled #3172
  • Dec 08 12:40
    meretp opened #3172
  • Dec 08 10:58
    pombredanne commented #3171
  • Dec 08 09:22
    MarcelWorschech commented #3168
  • Dec 08 09:00
    pombredanne commented #3171
  • Dec 08 07:30
    nuanyangchuncao starred nexB/scancode-toolkit
  • Dec 08 05:39
    jonz-secops starred nexB/scancode-toolkit
  • Dec 08 00:30
    soimkim labeled #3171
  • Dec 08 00:30
    soimkim opened #3171
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi, I am trying to add a plug-in for a new output format, it used to work in the previous versions (v3.2.1rc2) but with the latest released version, it is giving me a "Missing output option(s): at least one output option is required to save scan results." error. Is there any specific change in the latest version that i might have to look into?
Philippe Ombredanne
@pombredanne
@balakrishna-mukundaraj hum... is this public code?

there have been quite a few changes since 3.2.1rc2:
https://github.com/nexB/scancode-toolkit/compare/v3.2.1rc2...develop
like over 1000 commits

Showing 25,888 changed files with 281,590 additions and 385,844 deletions.

Philippe Ombredanne
@pombredanne

@balakrishna-mukundaraj that said the key change seems to be


@output_impl
class JsonPrettyOutput(OutputPlugin):

    options = [
        CommandLineOption(('--json-pp', 'output_json_pp',),
            type=FileOptionType(mode=mode, lazy=True),
            metavar='FILE',
            help='Write scan output as pretty-printed JSON to FILE.',
            help_group=OUTPUT_GROUP,
            sort_order=10),
    ]

which becomes now:

@output_impl
class JsonPrettyOutput(OutputPlugin):

    options = [
        PluggableCommandLineOption(('--json-pp', 'output_json_pp',),
            type=FileOptionType(mode='w', encoding='utf-8', lazy=True),
            metavar='FILE',
            help='Write scan output as pretty-printed JSON to FILE.',
            help_group=OUTPUT_GROUP,
            sort_order=10),
    ]
ak-iitb
@ak-iitb
Hi, I have a very basic query, does scan code detects license files in the source code or it generates the license files by looking at the libraries used in the code, for example, I am building something in JAVA and I have multiple opensource libraries, now if I scan my code with scancode then would it provide me the list of the libraries used and the licenses associated to them?
Philippe Ombredanne
@pombredanne

@guddutopper yes and no.
So the --package option will detect the packages and report dependencies (say in a pom.xml). So you will get the list in this way, at elast the list of direct dependencies.

It will not (yet) resolve nor fetch the dependencies tree to analyze them.
They would have to be in the scanned dir to be analyzed.
They would likely need to be extracted first with extractcode too, at least for now.

So if you scan your built and as-deployed app, you should have them all normally
ak-iitb
@ak-iitb
@pombredanne yes I have to extract the JAR files using the extractcode command and then it is able to generate a report for the licenses and copyright. However it gives a combined report, could it provide the license and copyright report per JAR file?
Philippe Ombredanne
@pombredanne
@guddutopper that's a great idea for an issue for a new feature.
ak-iitb
@ak-iitb
image.png
@pombredanne I had 120 JAR files, I extracted them all using extractcode, now when I run the scancode command, it is stuck since 3 hours.
Philippe Ombredanne
@pombredanne
:|
@guddutopper how many --processes ?
-n4.... :|
you may want to start with a scancode --package -n4 and no -cl yet?
ak-iitb
@ak-iitb
@pombredanne I did not understand, I should not use the -cl option? I just want to scan the extracted jar files(say 100) to get the license and copyright info. I dont need any other details such as class/filetype/filename etc. What should be the command for that
Philippe Ombredanne
@pombredanne
@guddutopper I was asking if you could try first with scancode --package -n4 --json-pp <your scan file name>.json for a start, to focus on the package manifests
ak-iitb
@ak-iitb
@pombredanne so this JSON file would be used as an input when I run the scancode next time with -cl option to fasten the process?
Philippe Ombredanne
@pombredanne
@guddutopper nope, this is more to get a quicker set of results.
I am worried otherwise as to why a scan of 120 jars would take so long
ak-iitb
@ak-iitb
@pombredanne using --package helped. Could you also tell me a way to removing the file information table from the report. I only want the licenses and copyright info but file info is automatically getting reported. I have only used the -pcl option.
Philippe Ombredanne
@pombredanne
@guddutopper which report?
Roshan Thomas
@Thomshan
I believe @guddutopper is referring to the HTML report. This was a problem for me as well since the file information table in the HTML report is massive in medium-large sized projects. MS Edge would constantly crash for me because there was just so much to render. I finally created a script to remove all lines pertaining to the file information table in the html file as a workaround.
Philippe Ombredanne
@pombredanne
@Thomshan @guddutopper ok, the HTML report is always going to be limited. The volume of data really means the JSON is more appropriate as an input to the workbench.
alternatively we could get a plugin to help filter things out. In all cases, having an issue is much welcomed :P
Roshan Thomas
@Thomshan
I have a question with regard to scanning ".class" files. If I'm not wrong, ".class" files do not retain any comments in them (so there won't be any license text). I used scancode to run a scan on a jar (jaxen-1.1.3.jar) and scancode reported the presence of a proprietary license in one of the .class files of this jar (jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class). I used an IDE to investigate and couldn't find anything at the line number reported by scancode. Any idea how/why this could have happened?
Philippe Ombredanne
@pombredanne

@Thomshan try this:
scancode --license --license-text --license-text-diagnostics --json-pp jaxen.json jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class

the results:

          "start_line": 134,
          "end_line": 134,
          "matched_rule": {
            "identifier": "proprietary-license_276.RULE",
            "license_expression": "proprietary-license",
            "licenses": [
              "proprietary-license"
            ],
....
            "matcher": "2-aho",
            "rule_length": 4,
            "matched_length": 4,
            "match_coverage": 100.0,
            "rule_relevance": 100
          },
          "matched_text": "may not be modified"

and ...

$ strings jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class | grep -A2 -B2 "may not be modified"
org/jaxen/dom/NamespaceNode
org/w3c/dom/DOMException
"Namespace node may not be modified
org.w3c.dom.Node
java/lang/Class

The thing is that 1. class files can contain copyright and licenses in literals and texts. 2. scancode does collect these strings in binaries

the fix is going to add a new RULE with this .yml is_false_positive: yes some notes: not a license reference, seen in Jaxen and this text in .RULE : node may not be modified
note that on https://repo1.maven.org/maven2/jaxen/jaxen/1.1.3/jaxen-1.1.3-sources.jar $ scancode --license --license-text --license-text-diagnostics --json-pp - jaxen-1.1.3-sources.jar-extract/org/jaxen/dom/NamespaceNode.java will also match the same text
Philippe Ombredanne
@pombredanne
using --license-text to get the match text lines (and additionally and optionally --license-text-diagnostics to only get the strictly matched words) is useful to find possible errors
Also proprietary-license_276.RULE relevance should NOT be 100 but rather 70 ... as this is not a super conclusive short rule, as witnessed by your problem.
Do you mind to draft a ticket with all these details?
Roshan Thomas
@Thomshan
Got it. Sure, I'll draft a ticket. Thank you.
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi, I am running trying to run the version scancode-toolkit 21.6.7 on docker but it seems to fail each time with the below error:

ERROR: Cannot install scancode-toolkit==21.6.7 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

The conflict is caused by:
scancode-toolkit 21.6.7 depends on pygments
The user requested (constraint) pygments

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

The command '/bin/sh -c ./scancode --help' returned a non-zero code: 1

is there something i can do to resolve this issue?
Philippe Ombredanne
@pombredanne
hum
@balakrishna-mukundaraj do you mind to enter an issue? that's a bug for sure
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi @pombredanne , please find the bug details below
Docker fails to run on scancode-toolkit 21.6.7 #2554
Philippe Ombredanne
@pombredanne
@balakrishna-mukundaraj thanks! I am looking into this
balakrishna-mukundaraj
@balakrishna-mukundaraj

Hi @pombredanne

while running a scan a source we had an EPL license match which was :

/*
  • This program and the accompanying materials are made available under the
  • terms of the Eclipse Public License 2.0 which is available at
  • http://www.eclipse.org/legal/epl-2.0
    *
  • SPDX-License-Identifier: EPL-2.0
    */
even though the block clearly says that it is a EPL license, the result from the scancode said epl-2.0 OR apache-2.0 (matched from the rule epl-2.0_or_apache-2.0_2.RULE). Is there any way to fix this issue? Since there is no Apache license found in the entire file.
Also matches to epl-2.0_or_apache-2.0_or_gpl-2.0_with_openjdk-exception.RULE in some cases when there is only epl license found.
Philippe Ombredanne
@pombredanne
@balakrishna-mukundaraj that's a bug :) do you mind to enter an issue? this is fairly esy to fix
Sarita Singh
@itssingh
@pombredanne How can I get complete license text of a license detected in a code file?
3 replies
Henrik Sandklef
@hesa

Scancode (thanks for developing it) generates SDPX version 2.1 (--spdx-tv or --spdx-rdf) and has spdx-tool 0.6.1 as requirements (listed in requirements.txt)

  • any plans moving to SPDX 2.2?
  • using examples/parse_rdf.py from tools-python 0.6.1 on an RDF generated by Scancode I get som errors (see below). WHat am I doing wrong?

Errors:
SPDXID must be "SPDXRef-[idstring]" where [idstring] is a unique string containing letters, numbers, ".", "-".
More than one File checksum defined.
More than one file copyright text defined.
Errors while parsing

Philippe Ombredanne
@pombredanne
Dear @hesa (Thanks for stopping by!) ... SPDX 2.2 support is in the works at https://github.com/spdx/tools-python which I also maintain ... there are quite a few WIP bits that I merged and I am about to make a release soon enough :)
note that if you want to chip in with a helping hand, you will never be turned aways :D
Henrik Sandklef
@hesa
I'd love to join. Currently can't though. Donyou have any dev guide?
@pombredanne , do you have any ideas what I am doing wrong when parsing as reading a scancode produced spdx report, as described above?