Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jul 02 21:08
    TG1999 edited #3014
  • Jul 02 21:06
    TG1999 opened #3014
  • Jul 02 21:06
    TG1999 labeled #3014
  • Jul 01 14:24
    DennisClark commented #3013
  • Jul 01 01:44
    JonoYang commented #3013
  • Jun 30 18:46
    DennisClark labeled #3013
  • Jun 30 18:46
    DennisClark opened #3013
  • Jun 30 18:46
    DennisClark assigned #3013
  • Jun 30 15:43
    DennisClark labeled #3012
  • Jun 30 15:43
    DennisClark labeled #3012
  • Jun 30 15:43
    DennisClark assigned #3012
  • Jun 30 15:43
    DennisClark labeled #3012
  • Jun 30 15:43
    DennisClark labeled #3012
  • Jun 30 15:43
    DennisClark opened #3012
  • Jun 30 15:31
    DennisClark commented #2987
  • Jun 30 15:28
    DennisClark commented #2987
  • Jun 30 15:27
    TG1999 edited #3010
  • Jun 30 15:26
    TG1999 edited #3010
  • Jun 30 15:26
    TG1999 edited #3010
Philippe Ombredanne
@pombredanne
@rijusougata13 then run "scancode" proper to run a scan. For instance: scancode -clipeu --json-pp - samples -n4
Henrik Sandklef
@hesa
Sorry to bug you all on this list. Have a question about Nexb's license-expression - where do I ask this?
Philippe Ombredanne
@pombredanne
@hesa you do not bug anyone. You can ask here alright :P
Henrik Sandklef
@hesa
Excellent :)
I wrote some words at the end of this issue over at maxhbr/LDBcollector#4
In short, how do I add a "translation" of a license (e.g. GPLv2 to GPL-2.0-only)? Do you have a procedure?
I have some 10-20 license expressions from Yocto that makes https://github.com/vinland-technology/flict scream and shout
I'd rather add the translation to license-expression than to flict
Philippe Ombredanne
@pombredanne
me thinks....
@hesa what you call a translation is a license detection (or you could call it a normalization).
This would typically a job for scancode-toolkit
But you can also use the license expression library for more constrained approach
The license expression parsing operates on license "symbols" each consisting of a key (say the SPDX id or the scancode key) and one or more "aliases" that can be arbitrary strings.
Philippe Ombredanne
@pombredanne
The translation would be to use GPLv2 as an alias
Henrik Sandklef
@hesa
Whatever we call it, I would like to license-expression to be able to go from "GPLv2" to "GPL-2.0-only" :)
Philippe Ombredanne
@pombredanne
@hesa actually, adding a list of alias to each license record may be the best
https://github.com/maxhbr/LDBcollector/issues/4#
this way they are closest together
and license-expression would just consume this
Philippe Ombredanne
@pombredanne
@hesa at some stage I would also like to integrate flict in scancode.io :)
1 reply
separate topic :)
Henrik Sandklef
@hesa
@pombredanne I am not sure how to add an "alias". Will this be a new concept in scancode? If so, I'll propose some syntax. If "alias" is already a concept, please show me an example :)
Maximilian Huber
@maxhbr
Henrik Sandklef
@hesa
@maxhbr The column "True", sometimes with the value "False" :), what is it for?
Henrik Sandklef
@hesa
@maxhbr Have these aliases been verified? E.g. X11 is in the table linked above an alias for ICU, but the license text differ although similar, so I am curious what a lawyer would say. Note: I would love to see more aliases in license-expression and your list is impressive :)
From a license compatibility point of view, they're compatible ("the same"), but not when attributing the license (text)?
Maximilian Huber
@maxhbr
The third column indicates whether that is a unique mapping. So if the same alias appears to be mapped to multiple licenses or was flagged ambiguous in the beginning, it is set to "False"
Henrik Sandklef
@hesa
Ah nice
Maximilian Huber
@maxhbr
The x11 / ICU clash comes from the scancode data and was already discussed in https://github.com/maxhbr/LDBcollector/issues/4#
Henrik Sandklef
@hesa
Yes. I am curious if the list of aliases can be used in license-expression (by first adding it to https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses).

Looking at the scancode files:
$ head -7 x11.yml
key: x11
short_name: X11 License
name: X11 License
category: Permissive
owner: XFree86 Project, Inc
homepage_url: http://www.xfree86.org/3.3.6/COPYRIGHT2.html
spdx_license_key: ICU

Is this the origin of "your" circular alias?

Maximilian Huber
@maxhbr
yes
in the SPXD list, X11 and ICU are two independent licenses, and this joins these two. this violates the rule that the main IDs never have clashes with aliases ...
Henrik Sandklef
@hesa
Hmmm... OK :(
Philippe Ombredanne
@pombredanne

If "alias" is already a concept, please show me an example :)

an alias is something in license-expression, but not in scancode. IMHO this would be a list of strings naned aliases

3 replies
Philippe Ombredanne
@pombredanne

@maxhbr re:

The LDBcollector allready has these aliases: https://github.com/maxhbr/LDBcollector/blob/generated/aliases/aliases.csv#L686

This would be perfect!

Philippe Ombredanne
@pombredanne

@maxhbr @hesa re:

The x11 / ICU clash comes from the scancode data and was already discussed in https://github.com/maxhbr/LDBcollector/issues/4#

I would rather say that SPDX got it differently possibly not right.
But basically:

  1. what scancode calls x11 and SPDX calls ICU originated at X11 not ICU. It is mapped to SPDX ICU alright in SC data.
  2. what scancode calls x11-xconsortium and SPDX call X11 is really specific to the X consortium. It is mapped to SPDX X11 alright in SC data.

But I reckon this may need some thinking wrt. to use as aliases. The simplest may be to ignore the scancode key in the set of license symbols and only use SPDX and aliases

This message was deleted
@hesa if you like to join there is a weekly call starting at https://meet.jit.si/AboutCode now
3 replies
Ayan Sinha Mahapatra
@AyanSinhaMahapatra
Philippe Ombredanne
@pombredanne
thx
Aditya Sangave
@adii21-Ux
Hi everyone I am Aditya an undergrad student from India, I just finished setting up my development environment for scancode because I want to contribute to this project and I am good with python, django, html, css and ready to contribute so if there are any issues I can work on to understand codebase please let me know.
Aditya Sangave
@adii21-Ux
Hello I was going through scancode-toolkit documentation and here (https://scancode-toolkit.readthedocs.io/en/latest/getting-started/newcomer.html) I found that there are these three points about scan which are repeated in two section namely Try Scancode Toolkit and Installing Scancode and I don't think its necessary in both topics.
Philippe Ombredanne
@pombredanne
@adii21-Ux good catch :) the doc needs some significant love alright (much more than just a few typos)
Aditya Sangave
@adii21-Ux
should I fix this and open a PR?
Philippe Ombredanne
@pombredanne
@adii21-Ux sure thing, and having something that touches code and not just doc is always welcomed too
Aditya Sangave
@adii21-Ux
ok, I'll make sure if I can do some other changes
Mike Rombout
@mrombout

I am having a bit of trouble understanding Query.unknown_by_pos. As far as I can tell the query tests/licensedcode/data/datadriven/external/fossology-tests/BSD/lz4.license.txt matches the rule src/licensedcode/data/rules/bsd-simplified_and_gpl-2.0_1.RULE exactly (apart from everything after line 7). Yet in the refine_matches phase (second iteration) it reports the following unknowns_by_pos = defaultdict(<class 'int'>, {43: 0, 41: 0, 15: 0, 21: 0, 20: 0}), I am particularly surprised by 15, 20 and 21. And this is throwing off #2637.

I was under the impression that token will be considered unknown if token not in query.idx.dictionary? But it is not that simple?

20 replies
Mike Rombout
@mrombout

I'm afraid I have another case that I'm not able to work out. Another one where the ispan is too inclusive: https://github.com/softsense/scancode-toolkit/blob/issue-2637-allow-license-rules-to-require-the-presence-of-certain-defining-keywords/tests/licensedcode/test_match.py#L325

The ispan of the match containsSpan(2,22), but I feel it should be Span(2,4)|Span(7...) so that it does not include the key phrase of Span(2,8)

Philippe Ombredanne
@pombredanne
@mrombout hey :wave: ...let me check.
Philippe Ombredanne
@pombredanne
@mrombout https://github.com/softsense/scancode-toolkit/pull/1/files#r758425422 you are being misled by the weirdness that may exist in very small indexes
and may be the actual nature of what is in an ispan?