Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 30 09:28
    pombredanne commented #2309
  • Oct 30 08:23
    pombredanne commented #2309
  • Oct 30 08:11
    pombredanne opened #2309
  • Oct 30 08:10
    pombredanne commented #2286
  • Oct 30 08:02
    pombredanne labeled #2308
  • Oct 30 08:02
    pombredanne opened #2308
  • Oct 30 00:01
    JonoYang opened #2307
  • Oct 30 00:01
    JonoYang review_requested #2307
  • Oct 29 21:46

    JonoYang on 2303-stream-json-output

    Wrap JSON writing process in tr… (compare)

  • Oct 29 21:44

    JonoYang on 2303-stream-json-output

    Wrap JSON writing process in tr… (compare)

  • Oct 29 20:49
    66Ton99 commented #2286
  • Oct 29 18:57
    TG1999 synchronize #46
  • Oct 29 18:52
    TG1999 synchronize #46
  • Oct 29 18:46
    TG1999 synchronize #46
  • Oct 29 15:17
    mjherzog commented #2304
  • Oct 29 14:53
    TG1999 synchronize #46
  • Oct 29 11:16
    pombredanne commented #2304
  • Oct 29 11:15
    pombredanne commented #2304
  • Oct 29 10:54
    Thalley commented #2304
  • Oct 28 21:07
    TG1999 synchronize #46
Philippe Ombredanne
@pombredanne

@thatch re:

it looks like the parse_metadata function expects a metadata.json -- was that just the transformation described under pep 566, or is there a better description?

In older wheel versions we had that file created. That's mostly legacy at this stage as this has been dropped since. We still need to process a whole PyPI archive eventually, so that's why we keep this. BUT this is questionable at best and this SHOULD BE documented and is not

We have a test file here https://github.com/nexB/scancode-toolkit/blob/develop/tests/packagedcode/data/pypi/metadata.json

And this was dropped, like 3 eons ago in wheel proper:
pypa/wheel@595e4a8

Philippe Ombredanne
@pombredanne

@thatch re:

how is parse(location) actually called given a directory without PKG-INFO? I'm wondering if there is code somewhere that merges the various items? In the future of pep 621, you might need to read all 3 of pyproject.toml, setup.cfg, setup.py to arrive at the full reality for setuptools metadata, for example. Right now dowsing requires a directory and I'm wondering how to best adapt that.

parse(location) only receives a single path and being able to resolve (and merge) data from multiple manifests be it for installed or "develop" is not there.
There are a few things to consider:

  1. packagedcode for now scans one file at a time and collects manifests. It kinda lies by saying a single manifest equals a package, and pyproject.toml, setup.cfg, setup.py is a great example to which you could add a (pinned or not) requirements file (but same with Gemfile, *.spec and Gemfile.lock or package.json, package-lock.json, node-shrinkwrap.json and yarn.lock and so on)
  2. there are possibly two ways to go at this:
    2.1 one is to allow a parser to wander around from a location and move around to collect and merge data from other files (but there are issues to resolve as the other files would be scanned on their own, so this would require maintaining quite a bit state across files at scan time something we do not do at all for now)
    2.2 the other would be to collect all the file-level data as we do today and assemble/merge them in a post processing stage (either in scancode-toolkit proper with a postscan plugin or in scancode.io as a pipeline step leverage the database storage in that later case)

So net-net: this is a conceptually simple problem but the solution is not entirely trivial (at least on our side because of the volume and variety of packages we can encounter)

and these... were the short answers :D
Philippe Ombredanne
@pombredanne
@mhow2 ping wrt the license bug you mentioned yesterday ... reminder to enter a ticket so it does not fall through the cracks ;)
Mohanish Kashiwar
@mk1107
Hello,
I am new to this community and would like to contribute to projects.
I am a beginner with knowledge of c++, python and java.
Can someone guide how I can contribute.
Thank you.
Steven Esser
@majurg

@mk1107 Hello and welcome

See: https://aboutcode.readthedocs.io/ for more info on how to get started. Additionally, check out the issue pages on our various repos to find things to work on :)

Tim Hatch
@thatch

2.1 one is to allow a parser to wander around from a location and move around to collect and merge data from other files

In the context of how could dowsing be integrated, it's pretty easy to drop in as it gives you back a Distribution and you don't care about the non-metadata fields. It needs to look around and being asked to interpret setup.cfg if the build-backend is poetry is nonsensical, but it doesn't know that without also reading pyproject.toml

so I could modify dowsing to give you single-file knowledge, and merging those isn't that terrible... but it does mean ignoring data would be up to you, if I'm not allowed to look around.

Also, someday it might follow imports e.g. for version, or read files like readme, as directed to by appropriate config. Is that a problem?

Mohanish Kashiwar
@mk1107
@majurg thank you
Philippe Ombredanne
@pombredanne
@mk1107 welcome too :wave:
@thatch I do not have a concern with "meandering" around on other files in earnest, this is more a matter of reporting the raw data in these cases so we can attribute them to a one or more given files. So I think the best for scancode and your use cases could be: dowse alright and gather the data from everywhere, report for a directory BUT also report/return the raw parsed data collected from each "manifest-like" file we collected data from
Philippe Ombredanne
@pombredanne
I can have scancode scan for directories too
the other thing that I need is to collect which are package files ... say for the simpler case of a site-packages that would be the RECORD content (with some caveats as RECORD "lies" at times about some files such as .pyc being there while they are not)
Philippe Ombredanne
@pombredanne
@thatch which really makes me think that the current approach to scanning one file at a time is flawed. Traceability of data as comming from a file is nice, but not the most important things. So I think we should completely rework the package manifest handling such that
  1. a package is a bunch of metadata and a bunch of files. (and a package != its package manifest) which really means we should report a package as its own entity and not dangling as a manifest file child.
  2. package metadata collectors should be left to meander as they please across multiple files. And the contract should be that they must report the list of files they have used (just which ones and not how in details they were used).
Philippe Ombredanne
@pombredanne
@thatch with these two we get something IMHO much more robust and useful in all cases
  1. for Python, we can holistically dowse on the whole manifest zoo (setup.,toml, reqs, Pipfile,PKG-INFO) etc and report things once and always get data from the correct one
  2. for Go, we can handle go.mod and go.sum at once, as well as some possible holdovers related manifests
  3. for Ruby, we can handle the Gemfile*, spec
  4. for npm, package.json, yarn and so on
    etc.
mucho better IMHO
Abhishek Tiwari
@AbhishekTiwari07
Hello everyone,I am new to this community and I would like to contribute to the projects .if someone can guide me in what manner may I start contributing
Tushar Goel
@TG1999
Hi @AbhishekTiwari07 :)
See: https://aboutcode.readthedocs.io/ for more info on how to get started. Additionally, check out the issue pages on our various repos to find things to work on :)
As said by Steven :)
Abhishek Tiwari
@AbhishekTiwari07
Thanks @TG1999
Tim Hatch
@thatch

@pombredanne It would be pretty easy to add "list of files read" to the api. What about e.g. if it does in setup.py

reqs = file("extra_requirements.txt").read().splitlines()
setup(requirements=reqs)

or in setup.cfg:

readme = "README.md"
(if you save the long desc)
would you want those reported as files that are read? Or if they're just a component of the metadata, no?
6 replies
Tim Hatch
@thatch

sorry bad example; setup.cfg would be

[metadata]
long_description = file: README.md

as in https://github.com/python-packaging/pessimist/blob/main/setup.cfg#L4

https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html?highlight=file%3A#metadata says that several fields for setuptools can come from either file: or attr: like this
Philippe Ombredanne
@pombredanne
@thatch I replied in a thread (threads are a little bit weird in Gitter)
Gaurav Singh
@GauravSingh9356
Hello everyone. I am new to this community and I would like to contribute to the projects. Please guide me. I have worked with python as well as javascript.
Philippe Ombredanne
@pombredanne

@GauravSingh9356 welcome :wave:
See: https://aboutcode.readthedocs.io/ for more info on how to get started. Additionally, check out the issue pages on our various repos to find things to work on :)

(as said by @TG1999 ;) )

Tushar Goel
@TG1999
:p
Mohanish Kashiwar
@mk1107
hello I am new to this project and working on nexB/scancode-toolkit#2299
i went through all the given links, but unable to understand what i need to do? So, if possible can u further guide me.
Philippe Ombredanne
@pombredanne
@/all I pushed the final 3.2.x release of scancode-toolkit :)
Tushar Goel
@TG1999
@pombredanne please review nexB/fetchcode#46, this is regarding setup files, so we can move towards publishing of fetchcode package and later integrating with scancode.io
Philippe Ombredanne
@pombredanne
@TG1999 can you sync up with @majurg as we are trying to adopt a new "skeletton" approach to base files https://github.com/jaraco/skeleton
Steven Esser
@majurg

@TG1999 Take a look at this brief documentation here https://github.com/nexB/skeleton/blob/develop/README.rst

Take a shot at implementing this. If you have questions, I can guide you on the specifics.

meer
@mirnumaan
hi
i am numaan bashir mir , i want to work on web development project for ur organisation and make the project most effective . how can i get started ?
Tushar Goel
@TG1999
Hi @majurg the link is broken, can you please check again :)
Tushar Goel
@TG1999
And can we schedule a call today, so we can make things fast and publish fetchcode ASAP
Steven Esser
@majurg
@TG1999 the link should now be accessable
Tushar Goel
@TG1999
Cool, thanks @majurg ,will be making changes soon in the PR :)
Tushar Goel
@TG1999
@majurg @pombredanne I have made the required changes please have a look :) nexB/fetchcode#46
mahi0601
@mahi0601
hi ,i am vey much new to open source .please someone help me to get aware how to start contributing to the open source.
Steven Esser
@majurg
Tushar Goel
@TG1999
Changes done in nexB/fetchcode#46, I can not understand why it is failing on CI, I have replicated the steps told to me, please have a look :)
Steven Esser
@majurg
@TG1999 Taking a look now...
Steven Esser
@majurg
@TG1999 feedback left
Tushar Goel
@TG1999
@majurg thanks for those changes, now there are some new problems , CI is still failing, I am not able to figure out what's the reason, tests were previously and still passing when I am using pytest, but I am not able to figure out why these errors are coming, please have a look :)
Steven Esser
@majurg
@TG1999 There seems to be some import issue, perhaps due to some code reorg? What happens when you do the following?:
$ ./configure
$ tmp/bin/pytest
Shivam Sandbhor
@sbs2001
@pombredanne I'm guessing we need to 'skeletonize' vulnerablecode too .
6 replies
homeboy445
@homeboy445
Hi there, I am a newbie to open source contribution and super excited to contribute to this organization. Could use some help to get started. Thanks a lot!