Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 19:27
    rufuspollock commented #241
  • Dec 14 2018 15:27
    StephenAbbott opened #246
  • Dec 03 2018 09:12
    rufuspollock commented #245
  • Nov 26 2018 14:51
    StephenAbbott opened #245
  • Nov 08 2018 08:31
    zelima commented #243
  • Nov 08 2018 08:05
    zelima closed #244
  • Nov 08 2018 08:05
    zelima commented #244
  • Nov 08 2018 07:57
    zaneselvans commented #244
  • Nov 07 2018 07:22
    zelima commented #244
  • Nov 07 2018 07:16
    akariv commented #244
  • Nov 07 2018 07:10
    akariv commented #234
  • Nov 06 2018 16:56
    parrottsquawk commented #234
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:23
    zelima commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Oct 24 2018 19:03
    zaneselvans opened #244
  • Oct 23 2018 09:40
    geraldb starred datahq/datahub-qa
  • Oct 19 2018 08:22

    Branko-Dj on master

    [travis][s]: Added update comma… (compare)

Rufus Pollock
@rufuspollock
@titojankowski :smile: - yes we switched to standard graphing system we’ve used on datahub.io as easier to maintain going forward (we think). We loved what you guys already had and have tried to imitate as closely as possible whilst moving away from custom svg :wink:
Michael Brunnbauer
@michaelbrunnbauer
Regarding the Imagesnippets dataset (https://old.datahub.io/dataset/imagesnippets): What do you mean exactly by "republish on datahub.io". I cannot find anything to submit dataset metadata - only stuff to upload data. Are we supposed to make an out of sync copy of our triples here?
Rufus Pollock
@rufuspollock

Regarding the Imagesnippets dataset (https://old.datahub.io/dataset/imagesnippets): What do you mean exactly by "republish on datahub.io". I cannot find anything to submit dataset metadata - only stuff to upload data. Are we supposed to make an out of sync copy of our triples here?

So the new datahub can do “metadata” only - you’d need to create a datapackage.json with empty resources array and push that. If you want you can do that :smile: - or you can push the dataset itself if that is possible (e.g. if it is bulk and reasonably static).

Tito Jankowski
@titojankowski
@rufuspollock nice thinking ahead!
Rufus Pollock
@rufuspollock
@titojankowski we’ve got a little more polishing to do. Once that is done what do you think of organizing a redirect - that way this will have a permanent reliable home url?
Michael Brunnbauer
@michaelbrunnbauer
@rufuspollock the old datahub has a download link for the datapackage.json but all the interesting metadata is in the resources array: https://old.datahub.io/dataset/imagesnippets/datapackage.json You said the resources array has to be empty but in that case I would not be able to provide a single link (to triple dump, SPARQL endpoint, dataset homepage, etc.). Are you sure about it?
@rufuspollock Also I cannot find any link to upload that file. Do I have to install any of your software? Would the dataset be findable by other users of datahub.io after uploading the metadata (e.g. in https://datahub.io/search)?
Anuar Ustayev
@anuveyatsu

@rufuspollock Also I cannot find any link to upload that file. Do I have to install any of your software? Would the dataset be findable by other users of datahub.io after uploading the metadata (e.g. in https://datahub.io/search)?

Hi @michaelbrunnbauer yes, you need to install data CLI tool to publish datasets - https://datahub.io/download. Once it is published, it will be findable by other users.

Rufus Pollock
@rufuspollock

https://old.datahub.io/dataset/imagesnippets/datapackage.json You said the resources array has to be empty but in that case I would not be able to provide a single link (to triple dump, SPARQL endpoint, dataset homepage, etc.). Are you sure about it?

You could add links to the resources array to the remote resources - that should work i think.

Lynn Greenwood
@lynngre_twitter
Hi all, I just gound datahub.io & wondered if anyone could let me know if it's possible to pickup the open data sets using the API & so how do we go about that without an API key or is it just an open read API?
Anuar Ustayev
@anuveyatsu
@lynngre_twitter Hi there! You can take a look at this tutorial - https://datahub.io/docs/getting-started/getting-data
Lynn Greenwood
@lynngre_twitter
Thanks!
Lynn Greenwood
@lynngre_twitter
Ok, sorry I'm not a developer . I still don't see how I can pickup specific datasets with the API without having an API key or oath of some kind.
We want to pickup open data sets & display them in widgets to click through to datahub from a website.
Anuar Ustayev
@anuveyatsu
@lynngre_twitter once you locate a dataset you can use r links, e.g., if you want to get this dataset https://datahub.io/core/finance-vix you’d use following URLs:
also, on each dataset page, you can see "Integrate this dataset into your favourite tool” section, see https://datahub.io/core/finance-vix#curl
Lynn Greenwood
@lynngre_twitter
Ahha, thanks a million
Christophe Benz
@cbenz
Hi, I'm the lead developer on db.nomics.world
I'm convinced we have common interests (economics, open-data, free software) and should work together especially with regards to discoverability (ease adoption, etc).
We'd love to catch up with you on our projects. @rufuspollock What could we start with, from a technical point of view?
Rufus Pollock
@rufuspollock

@cbenz one point of common interest would be how we’re building our data pipelines and what we could learn. We’ve been working a lot on a simple framework called “dataflows” built around tabular data packages and then running those in travis or gitlab runners if small (or in datahub itself as part of the SaaS data factory): https://github.com/datahq/dataflows - https://datahub.io/data-factory

In terms of monitoring, we currently have a monitoring and reporting system for dataflows run as part of datahub itself - but nothing for the travis/gitlab ones ...

@cbenz How are you guys building your pipelines, running them and monitoring them?
Christophe Benz
@cbenz

I have read dataflows / data-factory blog posts and docs; never used them. We use GitLab CI pipelines with "download" and "convert" jobs.
Example for World Bank: https://git.nomics.world/dbnomics-fetchers/wb-fetcher/-/jobs
We built a dedicated dashboard (fetching data in GitLab API): https://db.nomics.world/dashboard/
DBnomics jobs are Python scripts (https://git.nomics.world/dbnomics-fetchers/wb-fetcher). They don't follow a common abstraction (like an AbstractClass or such), but a common pattern:

The source code of the jobs isn't committed with data.

Sometimes the "convert" jobs use Pandas, sometimes Json-Stat Python module, sometimes lxml directly, xlrd (Excel), or other ways.

We really like the fact that jobs write files and that we keep the history via Git (rather than writing directly in a database for example, in a more traditional approach). But using Git with such amounts of data doesn't come without problems (slow commits, slow pulls, excessive RAM consumption on GitLab server, etc.).

We have two other jobs (visible in the dashboard):

Rufus Pollock
@rufuspollock

@cbenz really agree with you about patterns. In my experience of building stuff this is exactly how things start out: you have a download and extract/convert method or script.

All DataFlows is a way of having that common pattern plus a little bit of standardization and a library of common processors. If you had a moment to look at this it would be great to have your throughts: https://github.com/datahq/dataflows

Rufus Pollock
@rufuspollock
@cbenz did you see above ^^^? /cc @johanricher :smile:
JayPeaa
@JayPeaa
Hi
Wondering if someone can help
Rufus Pollock
@rufuspollock
@JayPeaa hi
JayPeaa
@JayPeaa
Hi
I'm a student working on a project in which I need to build a dashboard using DC.js.... but I pick global inequality as my topic.
Rufus Pollock
@rufuspollock
@JayPeaa sounds great!
JayPeaa
@JayPeaa
I'm trying to get my hands on a dataset which will enable me to build an interactive dashboard rather than just individual graphs.
@rufuspollock thanks :)
I'm wondering if datahub can help with providing some data?
I may have picked a challenging topic.
Rufus Pollock
@rufuspollock
@JayPeaa i think we can definitely help - i’d start looking at https://datahub.io/collections/wealth-income-and-inequality
JayPeaa
@JayPeaa
Thanks
I think the challenge is I'm not 100% certain on what it is I'm looking for until I see it.
I'll check out that link.
Rufus Pollock
@rufuspollock
@JayPeaa we also weant to add to that page a link and summary to world inequality database - https://wid.world/
JayPeaa
@JayPeaa
A lot of the data I find has already been summarised too and I'm looking for raw data to be able to manipulate it myself. preferably in CSV format.
JayPeaa
@JayPeaa
@rufuspollock Hi, everything alreaday seems to have been summarised into tables? Am I not looking in the right places? I just need raw data which give me the potential to interrogate it in multiple ways using various different dimensions? Any ideas on where I could get this? I don't need tables just the raw data? I'm probably not looking in the right place or missing a link or something.
Rufus Pollock
@rufuspollock
What do you mean raw data?
JayPeaa
@JayPeaa
The underlying data
Rufus Pollock
@rufuspollock
What do you mean by that exactly - can you give an example vs e.g. the wid.world data ...
JayPeaa
@JayPeaa
So if 1000 lines equal the total I need the 1000 lines
not the total.
Rufus Pollock
@rufuspollock
Can you give a specific example against one of the specific datasets ...
JayPeaa
@JayPeaa
@rufuspollock I think I need to figure out what I want to show and come back to you.
Thanks for your help.
JayPeaa
@JayPeaa
I tried downloading some data from wid.world but it doesn't make sense when I import into excel ?
image.png