Branko-Dj on master
[travis][s]: Added update comma… (compare)
I have read dataflows / data-factory blog posts and docs; never used them. We use GitLab CI pipelines with "download" and "convert" jobs.
Example for World Bank: https://git.nomics.world/dbnomics-fetchers/wb-fetcher/-/jobs
We built a dedicated dashboard (fetching data in GitLab API): https://db.nomics.world/dashboard/
DBnomics jobs are Python scripts (https://git.nomics.world/dbnomics-fetchers/wb-fetcher). They don't follow a common abstraction (like an AbstractClass or such), but a common pattern:
The source code of the jobs isn't committed with data.
Sometimes the "convert" jobs use Pandas, sometimes Json-Stat Python module, sometimes lxml directly, xlrd (Excel), or other ways.
We really like the fact that jobs write files and that we keep the history via Git (rather than writing directly in a database for example, in a more traditional approach). But using Git with such amounts of data doesn't come without problems (slow commits, slow pulls, excessive RAM consumption on GitLab server, etc.).
We have two other jobs (visible in the dashboard):
masterbranch) of each "json-data" repository
@cbenz really agree with you about patterns. In my experience of building stuff this is exactly how things start out: you have a download and extract/convert method or script.
All DataFlows is a way of having that common pattern plus a little bit of standardization and a library of common processors. If you had a moment to look at this it would be great to have your throughts: https://github.com/datahq/dataflows
Hi! I need a data set that has just over 25 data points for a school project and I really don't know how to find one...please help
@ellieszul_twitter hey, just take any of these - https://datahub.io/core
Hi! Is there a way (data-cli or other) to suggest a data package for Dataset Collections? or how to contact Dataset Collection's editor panel?
@andrejjh Hi! Do you want to suggest changes on collections page? If so, you just need to open a pull request here https://github.com/datahq/awesome-data