Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 06:51
    OctoPie23 commented #3241
  • 05:13
    OctoPie23 commented #3240
  • Feb 07 21:42
    jereviikari starred nexB/scancode-toolkit
  • Feb 07 13:23
    pombredanne opened #3242
  • Feb 07 13:18
    pombredanne labeled #3241
  • Feb 07 13:18
    pombredanne labeled #3241
  • Feb 07 13:18
    pombredanne labeled #3241
  • Feb 07 13:18
    pombredanne labeled #3241
  • Feb 07 13:18
    pombredanne opened #3241
  • Feb 07 12:31
    pombredanne labeled #3240
  • Feb 07 12:31
    pombredanne labeled #3240
  • Feb 07 12:31
    pombredanne labeled #3240
  • Feb 07 12:31
    pombredanne labeled #3240
  • Feb 07 12:31
    pombredanne opened #3240
  • Feb 07 11:09
    zzberth commented #3237
  • Feb 07 10:00
    pombredanne commented #3237
  • Feb 07 07:13
    erkansecurity starred nexB/scancode-toolkit
  • Feb 06 13:30
    AyanSinhaMahapatra commented #3238
  • Feb 06 13:22
    AyanSinhaMahapatra commented #3214
  • Feb 06 13:13
    AyanSinhaMahapatra assigned #3239
Philippe Ombredanne
@pombredanne
@Thomshan that's no trouble! that was a useful excercise :) come back at any time with any question :)
Ansh Srivastava
@anshsrtv

I got the following error while running make dev in ubuntu 18 terminal:

ERROR: Could not find a version that satisfies the requirement psycopg2==2.8.6
ERROR: No matching distribution found for psycopg2==2.8.6

What's the workaround or is this an issue to be solved?

Philippe Ombredanne
@pombredanne
@anshsrtv Is this on Windows WSL? do you have enough to compile otherwise? is this on X86_64 architecture?
@tdruez ^
Ashwin Raj
@ashwinraj-in
Hi Everyone. I am planning to contribute on improve PyPI package license detection results. Can someone give me a headstart for the approach that I shall take on this.
Philippe Ombredanne
@pombredanne
Hi :) so the primary approach would be to drive that from data, lots of them, e.g. start with starts on all the PyPI packages declared license data. Then based on that, work out detection tests, identify issues and possibly create new rules, code and mappings
Ashwin Raj
@ashwinraj-in
@pombredanne Is there any scope for NLP techniques that we can apply to improve the results
S3j5b0
@S3j5b0
Hi, ive been asked to implement the tool as part of a CI chain with github actions, is it possible to not necessarily output a file, but somehow indicate that license incompatibilities have been detected, and throw back an error or something? :D
Ayan Sinha Mahapatra
@AyanSinhaMahapatra
@S3j5b0 I think this warrants a ticket :P though in my knowledge no, we are far from from adding something that would throw back errors in case of license policy incompatibilities. There's a license ploicy plugin that might interest you -> https://scancode-toolkit.readthedocs.io/en/latest/plugins/licence_policy_plugin.html but due to the somewhat complex nature of license policy incompatibilities there isn't a ready CI automation solution available at present.
Philippe Ombredanne
@pombredanne
@AyanSinhaMahapatra good point!
@S3j5b0 incompatibilities is something that's not universal and is depends on the usage context and a policy of both the author/distributor of a product and its customer/user.
You could have a policy that prohibits Apache-2.0-licensed code and mine would mandate its use.
Philippe Ombredanne
@pombredanne
And another could state that using LGPL-licensed code is OK only if unmodified and linking dynamically.
policies, modifications, linking style are not things that can be easily determined. (Yet I wish we could do so... we are working on it ;) )
Even things that look like context-free facts are not that easy to deal. For instance the FSF states that the GPL-2.0 is incompatible with the Apache-2.0 license, but the GPL-3.0 is compatible.
Yet, say that some Apache-licensed code uses a GPL-2.0-licensed tool unmodified and in spawned its own independent process. In this case the FSF may say this is OK and that there may not be a compatibility issue.
Philippe Ombredanne
@pombredanne
e.g he same code when used differently may trigger compat issues or not.
I wish things would be simpler... but they are not :P
balakrishna-mukundaraj
@balakrishna-mukundaraj
Hi, I am trying to add a plug-in for a new output format, it used to work in the previous versions (v3.2.1rc2) but with the latest released version, it is giving me a "Missing output option(s): at least one output option is required to save scan results." error. Is there any specific change in the latest version that i might have to look into?
Philippe Ombredanne
@pombredanne
@balakrishna-mukundaraj hum... is this public code?

there have been quite a few changes since 3.2.1rc2:
https://github.com/nexB/scancode-toolkit/compare/v3.2.1rc2...develop
like over 1000 commits

Showing 25,888 changed files with 281,590 additions and 385,844 deletions.

Philippe Ombredanne
@pombredanne

@balakrishna-mukundaraj that said the key change seems to be


@output_impl
class JsonPrettyOutput(OutputPlugin):

    options = [
        CommandLineOption(('--json-pp', 'output_json_pp',),
            type=FileOptionType(mode=mode, lazy=True),
            metavar='FILE',
            help='Write scan output as pretty-printed JSON to FILE.',
            help_group=OUTPUT_GROUP,
            sort_order=10),
    ]

which becomes now:

@output_impl
class JsonPrettyOutput(OutputPlugin):

    options = [
        PluggableCommandLineOption(('--json-pp', 'output_json_pp',),
            type=FileOptionType(mode='w', encoding='utf-8', lazy=True),
            metavar='FILE',
            help='Write scan output as pretty-printed JSON to FILE.',
            help_group=OUTPUT_GROUP,
            sort_order=10),
    ]
ak-iitb
@ak-iitb
Hi, I have a very basic query, does scan code detects license files in the source code or it generates the license files by looking at the libraries used in the code, for example, I am building something in JAVA and I have multiple opensource libraries, now if I scan my code with scancode then would it provide me the list of the libraries used and the licenses associated to them?
Philippe Ombredanne
@pombredanne

@guddutopper yes and no.
So the --package option will detect the packages and report dependencies (say in a pom.xml). So you will get the list in this way, at elast the list of direct dependencies.

It will not (yet) resolve nor fetch the dependencies tree to analyze them.
They would have to be in the scanned dir to be analyzed.
They would likely need to be extracted first with extractcode too, at least for now.

So if you scan your built and as-deployed app, you should have them all normally
ak-iitb
@ak-iitb
@pombredanne yes I have to extract the JAR files using the extractcode command and then it is able to generate a report for the licenses and copyright. However it gives a combined report, could it provide the license and copyright report per JAR file?
Philippe Ombredanne
@pombredanne
@guddutopper that's a great idea for an issue for a new feature.
ak-iitb
@ak-iitb
image.png
@pombredanne I had 120 JAR files, I extracted them all using extractcode, now when I run the scancode command, it is stuck since 3 hours.
Philippe Ombredanne
@pombredanne
:|
@guddutopper how many --processes ?
-n4.... :|
you may want to start with a scancode --package -n4 and no -cl yet?
ak-iitb
@ak-iitb
@pombredanne I did not understand, I should not use the -cl option? I just want to scan the extracted jar files(say 100) to get the license and copyright info. I dont need any other details such as class/filetype/filename etc. What should be the command for that
Philippe Ombredanne
@pombredanne
@guddutopper I was asking if you could try first with scancode --package -n4 --json-pp <your scan file name>.json for a start, to focus on the package manifests
ak-iitb
@ak-iitb
@pombredanne so this JSON file would be used as an input when I run the scancode next time with -cl option to fasten the process?
Philippe Ombredanne
@pombredanne
@guddutopper nope, this is more to get a quicker set of results.
I am worried otherwise as to why a scan of 120 jars would take so long
ak-iitb
@ak-iitb
@pombredanne using --package helped. Could you also tell me a way to removing the file information table from the report. I only want the licenses and copyright info but file info is automatically getting reported. I have only used the -pcl option.
Philippe Ombredanne
@pombredanne
@guddutopper which report?
Roshan Thomas
@Thomshan
I believe @guddutopper is referring to the HTML report. This was a problem for me as well since the file information table in the HTML report is massive in medium-large sized projects. MS Edge would constantly crash for me because there was just so much to render. I finally created a script to remove all lines pertaining to the file information table in the html file as a workaround.
Philippe Ombredanne
@pombredanne
@Thomshan @guddutopper ok, the HTML report is always going to be limited. The volume of data really means the JSON is more appropriate as an input to the workbench.
alternatively we could get a plugin to help filter things out. In all cases, having an issue is much welcomed :P
Roshan Thomas
@Thomshan
I have a question with regard to scanning ".class" files. If I'm not wrong, ".class" files do not retain any comments in them (so there won't be any license text). I used scancode to run a scan on a jar (jaxen-1.1.3.jar) and scancode reported the presence of a proprietary license in one of the .class files of this jar (jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class). I used an IDE to investigate and couldn't find anything at the line number reported by scancode. Any idea how/why this could have happened?
Philippe Ombredanne
@pombredanne

@Thomshan try this:
scancode --license --license-text --license-text-diagnostics --json-pp jaxen.json jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class

the results:

          "start_line": 134,
          "end_line": 134,
          "matched_rule": {
            "identifier": "proprietary-license_276.RULE",
            "license_expression": "proprietary-license",
            "licenses": [
              "proprietary-license"
            ],
....
            "matcher": "2-aho",
            "rule_length": 4,
            "matched_length": 4,
            "match_coverage": 100.0,
            "rule_relevance": 100
          },
          "matched_text": "may not be modified"

and ...

$ strings jaxen-1.1.3.jar-extract/org/jaxen/dom/NamespaceNode.class | grep -A2 -B2 "may not be modified"
org/jaxen/dom/NamespaceNode
org/w3c/dom/DOMException
"Namespace node may not be modified
org.w3c.dom.Node
java/lang/Class

The thing is that 1. class files can contain copyright and licenses in literals and texts. 2. scancode does collect these strings in binaries

the fix is going to add a new RULE with this .yml is_false_positive: yes some notes: not a license reference, seen in Jaxen and this text in .RULE : node may not be modified
note that on https://repo1.maven.org/maven2/jaxen/jaxen/1.1.3/jaxen-1.1.3-sources.jar $ scancode --license --license-text --license-text-diagnostics --json-pp - jaxen-1.1.3-sources.jar-extract/org/jaxen/dom/NamespaceNode.java will also match the same text
Philippe Ombredanne
@pombredanne
using --license-text to get the match text lines (and additionally and optionally --license-text-diagnostics to only get the strictly matched words) is useful to find possible errors
Also proprietary-license_276.RULE relevance should NOT be 100 but rather 70 ... as this is not a super conclusive short rule, as witnessed by your problem.
Do you mind to draft a ticket with all these details?