Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 00:30
    soimkim labeled #3171
  • 00:30
    soimkim opened #3171
  • Dec 07 22:36
    rayanbettis starred nexB/scancode-toolkit
  • Dec 07 14:25
  • Dec 07 13:51
    huntantr opened #3170
  • Dec 07 13:51
    huntantr labeled #3170
  • Dec 07 03:03
    soimkim labeled #3169
  • Dec 07 03:03
    soimkim opened #3169
  • Dec 06 22:11
    kumekay starred nexB/scancode-toolkit
  • Dec 06 09:08
    silverskyvicto starred nexB/scancode-toolkit
  • Dec 06 06:53
    sky-kokubu-h starred nexB/scancode-toolkit
  • Dec 05 22:00
    Gator8 commented #3156
  • Dec 05 16:44
    pombredanne commented #3165
  • Dec 05 16:44
    pombredanne commented #3165
  • Dec 05 16:41
    pombredanne commented #3156
  • Dec 05 14:29
    pombredanne commented #3168
  • Dec 05 13:50
    Gator8 commented #3156
  • Dec 05 13:12
    pombredanne commented #3102
  • Dec 05 13:11
    pombredanne commented #3139
  • Dec 05 09:31
    MarcelWorschech opened #3168
Philippe Ombredanne
@pombredanne
You could have a policy that prohibits Apache-2.0-licensed code and mine would mandate its use.
And another could state that using LGPL-licensed code is OK only if unmodified and linking dynamically.
policies, modifications, linking style are not things that can be easily determined. (Yet I wish we could do so... we are working on it ;) )
Even things that look like context-free facts are not that easy to deal. For instance the FSF states that the GPL-2.0 is incompatible with the Apache-2.0 license, but the GPL-3.0 is compatible.
Philippe Ombredanne
@pombredanne
Yet, say that some Apache-licensed code uses a GPL-2.0-licensed tool unmodified and in spawned its own independent process. In this case the FSF may say this is OK and that there may not be a compatibility issue.
e.g he same code when used differently may trigger compat issues or not.
I wish things would be simpler... but they are not :P
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi, I am trying to add a plug-in for a new output format, it used to work in the previous versions (v3.2.1rc2) but with the latest released version, it is giving me a "Missing output option(s): at least one output option is required to save scan results." error. Is there any specific change in the latest version that i might have to look into?
Philippe Ombredanne
@pombredanne
@balakrishna-mukundaraj hum... is this public code?

there have been quite a few changes since 3.2.1rc2:
https://github.com/nexB/scancode-toolkit/compare/v3.2.1rc2...develop
like over 1000 commits

Showing 25,888 changed files with 281,590 additions and 385,844 deletions.

Philippe Ombredanne
@pombredanne

@balakrishna-mukundaraj that said the key change seems to be


@output_impl
class JsonPrettyOutput(OutputPlugin):

    options = [
        CommandLineOption(('--json-pp', 'output_json_pp',),
            type=FileOptionType(mode=mode, lazy=True),
            metavar='FILE',
            help='Write scan output as pretty-printed JSON to FILE.',
            help_group=OUTPUT_GROUP,
            sort_order=10),
    ]

which becomes now:

@output_impl
class JsonPrettyOutput(OutputPlugin):

    options = [
        PluggableCommandLineOption(('--json-pp', 'output_json_pp',),
            type=FileOptionType(mode='w', encoding='utf-8', lazy=True),
            metavar='FILE',
            help='Write scan output as pretty-printed JSON to FILE.',
            help_group=OUTPUT_GROUP,
            sort_order=10),
    ]
ak-iitb
@ak-iitb
Hi, I have a very basic query, does scan code detects license files in the source code or it generates the license files by looking at the libraries used in the code, for example, I am building something in JAVA and I have multiple opensource libraries, now if I scan my code with scancode then would it provide me the list of the libraries used and the licenses associated to them?
Philippe Ombredanne
@pombredanne

@guddutopper yes and no.
So the --package option will detect the packages and report dependencies (say in a pom.xml). So you will get the list in this way, at elast the list of direct dependencies.

It will not (yet) resolve nor fetch the dependencies tree to analyze them.
They would have to be in the scanned dir to be analyzed.
They would likely need to be extracted first with extractcode too, at least for now.

So if you scan your built and as-deployed app, you should have them all normally
ak-iitb
@ak-iitb
@pombredanne yes I have to extract the JAR files using the extractcode command and then it is able to generate a report for the licenses and copyright. However it gives a combined report, could it provide the license and copyright report per JAR file?
Philippe Ombredanne
@pombredanne
@guddutopper that's a great idea for an issue for a new feature.
ak-iitb
@ak-iitb
image.png
@pombredanne I had 120 JAR files, I extracted them all using extractcode, now when I run the scancode command, it is stuck since 3 hours.
Philippe Ombredanne
@pombredanne
:|
@guddutopper how many --processes ?
-n4.... :|
you may want to start with a scancode --package -n4 and no -cl yet?
ak-iitb
@ak-iitb
@pombredanne I did not understand, I should not use the -cl option? I just want to scan the extracted jar files(say 100) to get the license and copyright info. I dont need any other details such as class/filetype/filename etc. What should be the command for that
Philippe Ombredanne
@pombredanne
@guddutopper I was asking if you could try first with scancode --package -n4 --json-pp <your scan file name>.json for a start, to focus on the package manifests
ak-iitb
@ak-iitb
@pombredanne so this JSON file would be used as an input when I run the scancode next time with -cl option to fasten the process?
Philippe Ombredanne
@pombredanne
@guddutopper nope, this is more to get a quicker set of results.
I am worried otherwise as to why a scan of 120 jars would take so long
ak-iitb
@ak-iitb
@pombredanne using --package helped. Could you also tell me a way to removing the file information table from the report. I only want the licenses and copyright info but file info is automatically getting reported. I have only used the -pcl option.
Philippe Ombredanne
@pombredanne
@guddutopper which report?
Roshan Thomas
@Thomshan
I believe @guddutopper is referring to the HTML report. This was a problem for me as well since the file information table in the HTML report is massive in medium-large sized projects. MS Edge would constantly crash for me because there was just so much to render. I finally created a script to remove all lines pertaining to the file information table in the html file as a workaround.
Philippe Ombredanne
@pombredanne
@Thomshan @guddutopper ok, the HTML report is always going to be limited. The volume of data really means the JSON is more appropriate as an input to the workbench.
alternatively we could get a plugin to help filter things out. In all cases, having an issue is much welcomed :P
Roshan Thomas
@Thomshan
I have a question with regard to scanning ".class" files. If I'm not wrong, ".class" files do not retain any comments in them (so there won't be any license text). I used scancode to run a scan on a jar (jaxen-1.1.3.jar) and scancode reported the presence of a proprietary license in one of the .class files of this jar (jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class). I used an IDE to investigate and couldn't find anything at the line number reported by scancode. Any idea how/why this could have happened?
Philippe Ombredanne
@pombredanne

@Thomshan try this:
scancode --license --license-text --license-text-diagnostics --json-pp jaxen.json jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class

the results:

          "start_line": 134,
          "end_line": 134,
          "matched_rule": {
            "identifier": "proprietary-license_276.RULE",
            "license_expression": "proprietary-license",
            "licenses": [
              "proprietary-license"
            ],
....
            "matcher": "2-aho",
            "rule_length": 4,
            "matched_length": 4,
            "match_coverage": 100.0,
            "rule_relevance": 100
          },
          "matched_text": "may not be modified"

and ...

$ strings jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class | grep -A2 -B2 "may not be modified"
org/jaxen/dom/NamespaceNode
org/w3c/dom/DOMException
"Namespace node may not be modified
org.w3c.dom.Node
java/lang/Class

The thing is that 1. class files can contain copyright and licenses in literals and texts. 2. scancode does collect these strings in binaries

the fix is going to add a new RULE with this .yml is_false_positive: yes some notes: not a license reference, seen in Jaxen and this text in .RULE : node may not be modified
note that on https://repo1.maven.org/maven2/jaxen/jaxen/1.1.3/jaxen-1.1.3-sources.jar $ scancode --license --license-text --license-text-diagnostics --json-pp - jaxen-1.1.3-sources.jar-extract/org/jaxen/dom/NamespaceNode.java will also match the same text
Philippe Ombredanne
@pombredanne
using --license-text to get the match text lines (and additionally and optionally --license-text-diagnostics to only get the strictly matched words) is useful to find possible errors
Also proprietary-license_276.RULE relevance should NOT be 100 but rather 70 ... as this is not a super conclusive short rule, as witnessed by your problem.
Do you mind to draft a ticket with all these details?
Roshan Thomas
@Thomshan
Got it. Sure, I'll draft a ticket. Thank you.
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi, I am running trying to run the version scancode-toolkit 21.6.7 on docker but it seems to fail each time with the below error:

ERROR: Cannot install scancode-toolkit==21.6.7 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

The conflict is caused by:
scancode-toolkit 21.6.7 depends on pygments
The user requested (constraint) pygments

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

The command '/bin/sh -c ./scancode --help' returned a non-zero code: 1

is there something i can do to resolve this issue?
Philippe Ombredanne
@pombredanne
hum
@balakrishna-mukundaraj do you mind to enter an issue? that's a bug for sure
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi @pombredanne , please find the bug details below
Docker fails to run on scancode-toolkit 21.6.7 #2554
Philippe Ombredanne
@pombredanne
@balakrishna-mukundaraj thanks! I am looking into this
balakrishna-mukundaraj
@balakrishna-mukundaraj

Hi @pombredanne

while running a scan a source we had an EPL license match which was :

/*
  • This program and the accompanying materials are made available under the
  • terms of the Eclipse Public License 2.0 which is available at
  • http://www.eclipse.org/legal/epl-2.0
    *
  • SPDX-License-Identifier: EPL-2.0
    */
even though the block clearly says that it is a EPL license, the result from the scancode said epl-2.0 OR apache-2.0 (matched from the rule epl-2.0_or_apache-2.0_2.RULE). Is there any way to fix this issue? Since there is no Apache license found in the entire file.
Also matches to epl-2.0_or_apache-2.0_or_gpl-2.0_with_openjdk-exception.RULE in some cases when there is only epl license found.