Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 19:27
    rufuspollock commented #241
  • Dec 14 2018 15:27
    StephenAbbott opened #246
  • Dec 03 2018 09:12
    rufuspollock commented #245
  • Nov 26 2018 14:51
    StephenAbbott opened #245
  • Nov 08 2018 08:31
    zelima commented #243
  • Nov 08 2018 08:05
    zelima closed #244
  • Nov 08 2018 08:05
    zelima commented #244
  • Nov 08 2018 07:57
    zaneselvans commented #244
  • Nov 07 2018 07:22
    zelima commented #244
  • Nov 07 2018 07:16
    akariv commented #244
  • Nov 07 2018 07:10
    akariv commented #234
  • Nov 06 2018 16:56
    parrottsquawk commented #234
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:23
    zelima commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Oct 24 2018 19:03
    zaneselvans opened #244
  • Oct 23 2018 09:40
    geraldb starred datahq/datahub-qa
  • Oct 19 2018 08:22

    Branko-Dj on master

    [travis][s]: Added update comma… (compare)

Rufus Pollock
@rufuspollock
@mel_kumar_gitlab sure! You can see https://datahub.io/docs/about
Mahmoud
@GeniusItech_twitter
hi alll
Rufus Pollock
@rufuspollock
hi
zeroexp
@zeroexp
HI - we had a dataset published on the old datahub.io but I cannot find it on the datahub.io -- is there something i need to do on my end to have it included and findable in the new datahub.io site?
the link to the dataset on the old.datahub.io is: https://old.datahub.io/dataset?q=Imagesnippets+
Rufus Pollock
@rufuspollock
@zeroexp yes, you will need to republish on datahub.io if you want it to show up there. Please go for that and less us know if you have any issues.

Carbon Doomsday update

@titojankowski we’ve been busying working on a port of your carbondoomsday - it is still in progress but you can see how it looks here https://carbon.datahub.io/

If you or any colleagues want to help out the repo is here https://github.com/datahq/carbondoomsday

Tito Jankowski
@titojankowski
@rufuspollock that’s terrific! wow
look at that! cool
glad to see the port is up and running!
any questions I can help answer for the team?
Tito Jankowski
@titojankowski
looks like a ton of work went into it, completely new graphing system
Rufus Pollock
@rufuspollock
@titojankowski :smile: - yes we switched to standard graphing system we’ve used on datahub.io as easier to maintain going forward (we think). We loved what you guys already had and have tried to imitate as closely as possible whilst moving away from custom svg :wink:
Michael Brunnbauer
@michaelbrunnbauer
Regarding the Imagesnippets dataset (https://old.datahub.io/dataset/imagesnippets): What do you mean exactly by "republish on datahub.io". I cannot find anything to submit dataset metadata - only stuff to upload data. Are we supposed to make an out of sync copy of our triples here?
Rufus Pollock
@rufuspollock

Regarding the Imagesnippets dataset (https://old.datahub.io/dataset/imagesnippets): What do you mean exactly by "republish on datahub.io". I cannot find anything to submit dataset metadata - only stuff to upload data. Are we supposed to make an out of sync copy of our triples here?

So the new datahub can do “metadata” only - you’d need to create a datapackage.json with empty resources array and push that. If you want you can do that :smile: - or you can push the dataset itself if that is possible (e.g. if it is bulk and reasonably static).

Tito Jankowski
@titojankowski
@rufuspollock nice thinking ahead!
Rufus Pollock
@rufuspollock
@titojankowski we’ve got a little more polishing to do. Once that is done what do you think of organizing a redirect - that way this will have a permanent reliable home url?
Michael Brunnbauer
@michaelbrunnbauer
@rufuspollock the old datahub has a download link for the datapackage.json but all the interesting metadata is in the resources array: https://old.datahub.io/dataset/imagesnippets/datapackage.json You said the resources array has to be empty but in that case I would not be able to provide a single link (to triple dump, SPARQL endpoint, dataset homepage, etc.). Are you sure about it?
@rufuspollock Also I cannot find any link to upload that file. Do I have to install any of your software? Would the dataset be findable by other users of datahub.io after uploading the metadata (e.g. in https://datahub.io/search)?
Anuar Ustayev
@anuveyatsu

@rufuspollock Also I cannot find any link to upload that file. Do I have to install any of your software? Would the dataset be findable by other users of datahub.io after uploading the metadata (e.g. in https://datahub.io/search)?

Hi @michaelbrunnbauer yes, you need to install data CLI tool to publish datasets - https://datahub.io/download. Once it is published, it will be findable by other users.

Rufus Pollock
@rufuspollock

https://old.datahub.io/dataset/imagesnippets/datapackage.json You said the resources array has to be empty but in that case I would not be able to provide a single link (to triple dump, SPARQL endpoint, dataset homepage, etc.). Are you sure about it?

You could add links to the resources array to the remote resources - that should work i think.

Lynn Greenwood
@lynngre_twitter
Hi all, I just gound datahub.io & wondered if anyone could let me know if it's possible to pickup the open data sets using the API & so how do we go about that without an API key or is it just an open read API?
Anuar Ustayev
@anuveyatsu
@lynngre_twitter Hi there! You can take a look at this tutorial - https://datahub.io/docs/getting-started/getting-data
Lynn Greenwood
@lynngre_twitter
Thanks!
Lynn Greenwood
@lynngre_twitter
Ok, sorry I'm not a developer . I still don't see how I can pickup specific datasets with the API without having an API key or oath of some kind.
We want to pickup open data sets & display them in widgets to click through to datahub from a website.
Anuar Ustayev
@anuveyatsu
@lynngre_twitter once you locate a dataset you can use r links, e.g., if you want to get this dataset https://datahub.io/core/finance-vix you’d use following URLs:
also, on each dataset page, you can see "Integrate this dataset into your favourite tool” section, see https://datahub.io/core/finance-vix#curl
Lynn Greenwood
@lynngre_twitter
Ahha, thanks a million
Christophe Benz
@cbenz
Hi, I'm the lead developer on db.nomics.world
I'm convinced we have common interests (economics, open-data, free software) and should work together especially with regards to discoverability (ease adoption, etc).
We'd love to catch up with you on our projects. @rufuspollock What could we start with, from a technical point of view?
Rufus Pollock
@rufuspollock

@cbenz one point of common interest would be how we’re building our data pipelines and what we could learn. We’ve been working a lot on a simple framework called “dataflows” built around tabular data packages and then running those in travis or gitlab runners if small (or in datahub itself as part of the SaaS data factory): https://github.com/datahq/dataflows - https://datahub.io/data-factory

In terms of monitoring, we currently have a monitoring and reporting system for dataflows run as part of datahub itself - but nothing for the travis/gitlab ones ...

@cbenz How are you guys building your pipelines, running them and monitoring them?
Christophe Benz
@cbenz

I have read dataflows / data-factory blog posts and docs; never used them. We use GitLab CI pipelines with "download" and "convert" jobs.
Example for World Bank: https://git.nomics.world/dbnomics-fetchers/wb-fetcher/-/jobs
We built a dedicated dashboard (fetching data in GitLab API): https://db.nomics.world/dashboard/
DBnomics jobs are Python scripts (https://git.nomics.world/dbnomics-fetchers/wb-fetcher). They don't follow a common abstraction (like an AbstractClass or such), but a common pattern:

The source code of the jobs isn't committed with data.

Sometimes the "convert" jobs use Pandas, sometimes Json-Stat Python module, sometimes lxml directly, xlrd (Excel), or other ways.

We really like the fact that jobs write files and that we keep the history via Git (rather than writing directly in a database for example, in a more traditional approach). But using Git with such amounts of data doesn't come without problems (slow commits, slow pulls, excessive RAM consumption on GitLab server, etc.).

We have two other jobs (visible in the dashboard):

Rufus Pollock
@rufuspollock

@cbenz really agree with you about patterns. In my experience of building stuff this is exactly how things start out: you have a download and extract/convert method or script.

All DataFlows is a way of having that common pattern plus a little bit of standardization and a library of common processors. If you had a moment to look at this it would be great to have your throughts: https://github.com/datahq/dataflows

Rufus Pollock
@rufuspollock
@cbenz did you see above ^^^? /cc @johanricher :smile:
JayPeaa
@JayPeaa
Hi
Wondering if someone can help
Rufus Pollock
@rufuspollock
@JayPeaa hi
JayPeaa
@JayPeaa
Hi
I'm a student working on a project in which I need to build a dashboard using DC.js.... but I pick global inequality as my topic.
Rufus Pollock
@rufuspollock
@JayPeaa sounds great!
JayPeaa
@JayPeaa
I'm trying to get my hands on a dataset which will enable me to build an interactive dashboard rather than just individual graphs.
@rufuspollock thanks :)
I'm wondering if datahub can help with providing some data?
I may have picked a challenging topic.
Rufus Pollock
@rufuspollock
@JayPeaa i think we can definitely help - i’d start looking at https://datahub.io/collections/wealth-income-and-inequality
JayPeaa
@JayPeaa
Thanks
I think the challenge is I'm not 100% certain on what it is I'm looking for until I see it.
I'll check out that link.
Rufus Pollock
@rufuspollock
@JayPeaa we also weant to add to that page a link and summary to world inequality database - https://wid.world/