This is a channel focused on ScanCode support and not as noisy as the main discuss channel
johnmhoran on 2945-file-cat
Add initial (failing) test #294… (compare)
pombredanne on prepare-31b5
Organize imports Signed-off-by… Add new methods to collect pack… Recognize either app or system … and 1 more (compare)
pombredanne on fix-2943-pkg-info-bug
pombredanne on develop
Modify pypi PKG-INFO parse Ref… Merge pull request #2953 from n… (compare)
unknown words
DO NOT exist anywhere in any RULE or LICENSE. They can be seen only in the Query.unknowns_by_pos where we only track how many unknown words exist after a known word position. They are not present in the ispan nor the qspanstopwords
exist in RULEs and LICENSEs are short, too common words to be useful. They are skipped both on the index and query side. They are not present in the ispan nor the qspan. They can be seen only in the Query.stopwords_by_pos where we only track how many stopwords exist after a known word position. by construction, key_phrase_span
should be :
Therefore, it should be possible to do key_phrase_span in match.qspan
.
I reckon the code snippet above is for the next step.
Yes, that's what I was trying to communicate. So that under the {{Creative Commons Attribution 4.0 International License}} (the "License");
won't match under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
.
The key_phrase_span
is a single key phrase.
Yes, that's what I was trying to communicate. So that under the {{Creative Commons Attribution 4.0 International License}} (the "License"); won't match under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
yes, remember my comment on your PR... if you have a only one rule in your test index, the univers of unknown words is very large :D
That was just a bad attempt to make my example easier to run, the same problem persists when running as a datadriven test (they use the full index right?). Hence why I am taking another look.
cc-by-nc-sa-4.0
match IMHO
--max-in-memory
to use disk-caching...