Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 19:27
    rufuspollock commented #241
  • Dec 14 2018 15:27
    StephenAbbott opened #246
  • Dec 03 2018 09:12
    rufuspollock commented #245
  • Nov 26 2018 14:51
    StephenAbbott opened #245
  • Nov 08 2018 08:31
    zelima commented #243
  • Nov 08 2018 08:05
    zelima closed #244
  • Nov 08 2018 08:05
    zelima commented #244
  • Nov 08 2018 07:57
    zaneselvans commented #244
  • Nov 07 2018 07:22
    zelima commented #244
  • Nov 07 2018 07:16
    akariv commented #244
  • Nov 07 2018 07:10
    akariv commented #234
  • Nov 06 2018 16:56
    parrottsquawk commented #234
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:23
    zelima commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Oct 24 2018 19:03
    zaneselvans opened #244
  • Oct 23 2018 09:40
    geraldb starred datahq/datahub-qa
  • Oct 19 2018 08:22

    Branko-Dj on master

    [travis][s]: Added update comma… (compare)

Lynn Greenwood
@lynngre_twitter
Ahha, thanks a million
Christophe Benz
@cbenz
Hi, I'm the lead developer on db.nomics.world
I'm convinced we have common interests (economics, open-data, free software) and should work together especially with regards to discoverability (ease adoption, etc).
We'd love to catch up with you on our projects. @rufuspollock What could we start with, from a technical point of view?
Rufus Pollock
@rufuspollock

@cbenz one point of common interest would be how we’re building our data pipelines and what we could learn. We’ve been working a lot on a simple framework called “dataflows” built around tabular data packages and then running those in travis or gitlab runners if small (or in datahub itself as part of the SaaS data factory): https://github.com/datahq/dataflows - https://datahub.io/data-factory

In terms of monitoring, we currently have a monitoring and reporting system for dataflows run as part of datahub itself - but nothing for the travis/gitlab ones ...

@cbenz How are you guys building your pipelines, running them and monitoring them?
Christophe Benz
@cbenz

I have read dataflows / data-factory blog posts and docs; never used them. We use GitLab CI pipelines with "download" and "convert" jobs.
Example for World Bank: https://git.nomics.world/dbnomics-fetchers/wb-fetcher/-/jobs
We built a dedicated dashboard (fetching data in GitLab API): https://db.nomics.world/dashboard/
DBnomics jobs are Python scripts (https://git.nomics.world/dbnomics-fetchers/wb-fetcher). They don't follow a common abstraction (like an AbstractClass or such), but a common pattern:

The source code of the jobs isn't committed with data.

Sometimes the "convert" jobs use Pandas, sometimes Json-Stat Python module, sometimes lxml directly, xlrd (Excel), or other ways.

We really like the fact that jobs write files and that we keep the history via Git (rather than writing directly in a database for example, in a more traditional approach). But using Git with such amounts of data doesn't come without problems (slow commits, slow pulls, excessive RAM consumption on GitLab server, etc.).

We have two other jobs (visible in the dashboard):

Rufus Pollock
@rufuspollock

@cbenz really agree with you about patterns. In my experience of building stuff this is exactly how things start out: you have a download and extract/convert method or script.

All DataFlows is a way of having that common pattern plus a little bit of standardization and a library of common processors. If you had a moment to look at this it would be great to have your throughts: https://github.com/datahq/dataflows

Rufus Pollock
@rufuspollock
@cbenz did you see above ^^^? /cc @johanricher :smile:
JayPeaa
@JayPeaa
Hi
Wondering if someone can help
Rufus Pollock
@rufuspollock
@JayPeaa hi
JayPeaa
@JayPeaa
Hi
I'm a student working on a project in which I need to build a dashboard using DC.js.... but I pick global inequality as my topic.
Rufus Pollock
@rufuspollock
@JayPeaa sounds great!
JayPeaa
@JayPeaa
I'm trying to get my hands on a dataset which will enable me to build an interactive dashboard rather than just individual graphs.
@rufuspollock thanks :)
I'm wondering if datahub can help with providing some data?
I may have picked a challenging topic.
Rufus Pollock
@rufuspollock
@JayPeaa i think we can definitely help - i’d start looking at https://datahub.io/collections/wealth-income-and-inequality
JayPeaa
@JayPeaa
Thanks
I think the challenge is I'm not 100% certain on what it is I'm looking for until I see it.
I'll check out that link.
Rufus Pollock
@rufuspollock
@JayPeaa we also weant to add to that page a link and summary to world inequality database - https://wid.world/
JayPeaa
@JayPeaa
A lot of the data I find has already been summarised too and I'm looking for raw data to be able to manipulate it myself. preferably in CSV format.
JayPeaa
@JayPeaa
@rufuspollock Hi, everything alreaday seems to have been summarised into tables? Am I not looking in the right places? I just need raw data which give me the potential to interrogate it in multiple ways using various different dimensions? Any ideas on where I could get this? I don't need tables just the raw data? I'm probably not looking in the right place or missing a link or something.
Rufus Pollock
@rufuspollock
What do you mean raw data?
JayPeaa
@JayPeaa
The underlying data
Rufus Pollock
@rufuspollock
What do you mean by that exactly - can you give an example vs e.g. the wid.world data ...
JayPeaa
@JayPeaa
So if 1000 lines equal the total I need the 1000 lines
not the total.
Rufus Pollock
@rufuspollock
Can you give a specific example against one of the specific datasets ...
JayPeaa
@JayPeaa
@rufuspollock I think I need to figure out what I want to show and come back to you.
Thanks for your help.
JayPeaa
@JayPeaa
I tried downloading some data from wid.world but it doesn't make sense when I import into excel ?
image.png
See headings in yellow.
Tito Jankowski
@titojankowski
@rufuspollock a permanent home sounds great, let’s make it happen!
Tito Jankowski
@titojankowski
excited to see the next revision of the site!
i looked through all the github issues, looks like things are moving along
el
@ellieszul_twitter
Hi! I need a data set that has just over 25 data points for a school project and I really don't know how to find one...please help
André Heughebaert
@andrejjh
Hi! Is there a way (data-cli or other) to suggest a data package for Dataset Collections? or how to contact Dataset Collection's editor panel?
Xeverus01
@Xeverus01
Hey guys, I think the feed here as stopped. It was updating daily and isn't anymore. datahub.io/core/co2-ppm
Anuar Ustayev
@anuveyatsu

Hi! I need a data set that has just over 25 data points for a school project and I really don't know how to find one...please help

@ellieszul_twitter hey, just take any of these - https://datahub.io/core

Hi! Is there a way (data-cli or other) to suggest a data package for Dataset Collections? or how to contact Dataset Collection's editor panel?

@andrejjh Hi! Do you want to suggest changes on collections page? If so, you just need to open a pull request here https://github.com/datahq/awesome-data

@Xeverus01 Hi! I don’t think it was updated on daily basis. You probably meant this one https://datahub.io/core/co2-ppm-daily
Ghezal Ahmad Zia
@ghezalahmad
HI, I want to do Relation Extraction on Persian, language, does YAGO have persian language.
Maria La Del Kafrio
@Anaisthiti_twitter
Hello, I want a dataset that contains except from data their provenance information too
Tito Jankowski
@titojankowski
@rufuspollock heya, how’s it going on your end with carbon.datahub.io?
Rufus Pollock
@rufuspollock
@titojankowski we’re doing good - it is largely good to go from our end ...
Tito Jankowski
@titojankowski
@rufuspollock thanks for the update! is that latest build reflected on https://carbon.datahub.io ?
Tito Jankowski
@titojankowski
to me the next step is to redirect carbondoomsday.com there, what do you think?