Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 19:27
    rufuspollock commented #241
  • Dec 14 2018 15:27
    StephenAbbott opened #246
  • Dec 03 2018 09:12
    rufuspollock commented #245
  • Nov 26 2018 14:51
    StephenAbbott opened #245
  • Nov 08 2018 08:31
    zelima commented #243
  • Nov 08 2018 08:05
    zelima closed #244
  • Nov 08 2018 08:05
    zelima commented #244
  • Nov 08 2018 07:57
    zaneselvans commented #244
  • Nov 07 2018 07:22
    zelima commented #244
  • Nov 07 2018 07:16
    akariv commented #244
  • Nov 07 2018 07:10
    akariv commented #234
  • Nov 06 2018 16:56
    parrottsquawk commented #234
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:23
    zelima commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Oct 24 2018 19:03
    zaneselvans opened #244
  • Oct 23 2018 09:40
    geraldb starred datahq/datahub-qa
  • Oct 19 2018 08:22

    Branko-Dj on master

    [travis][s]: Added update comma… (compare)

Rufus Pollock
@rufuspollock
@zaneselvans yes - key point is that wrangling you do in your tool of choice, perhaps using the pattern developed in dataflows, and then "push" to datahub you use data or the API directly.
Johan Richer
@johanricher
Hey guys, FYI following discussions we had with them a while back, our ex-colleagues at Etalab (the French open data agency) are starting to work with data packages:
https://github.com/opendatateam/datapackage-pipelines-udata
https://twitter.com/taniki/status/1035110812011126785
(uData is the software project powering data.gouv.fr)
Paul Walsh
@pwalsh
@johanricher great to hear!
Chris Hale
@chrispomeroyhale
Hi. I'm having trouble loading https://datahub.io/core -- I get a 502 from Cloudflare
Irakli Mchedlishvili
@zelima
@slythfox thanks for reporting. We are working on it
@slythfox should be fine now
Zane Selvans
@zaneselvans
@johanricher @pwalsh The French electricity grid operator (RTE France) has been enthusiastic about providing open data, and someone from their open data portal (named Hoang Nguyen) got an earful yesterday in support of Data Packages from the folks at the Open Power System Data project. It might be useful to connect someone from etalab with Nguyen at RTE, if they aren't in touch already.
Johan Richer
@johanricher
@zaneselvans sure, I'd be glad to help them navigate! Can you put me in contact?
Anuar Ustayev
@anuveyatsu

📰📢 Check out a list of core datasets that are updated on a regular basis:

https://datahub.io/blog/automatically-updated-core-datasets-on-datahub
Vaibhav Maheshwari
@vaibhavgeek
I have a csv file, it has been renamed because of a competition. I want to know which dataset does that file actually belong to?
Does anyone know a place where I can upload the csv file and it shows me relevant results
Rufus Pollock
@rufuspollock

@vaibhavgeek can you give a bit more detail on the issue with file rename.

To upload a file: just follow the instructions here https://datahub.io/docs/getting-started/publishing-data

Stephen Abbott Pugh
@StephenAbbott

Hi there. Just been testing out Google's new Dataset Search and found some spam datasets uploaded to the old datahub.io around 2013.

Example = https://toolbox.google.com/datasetsearch/search?query=black%20site%3Adatahub.io&docid=hSZEp7J5ZDHbBETSAAAAAA%3D%3D

Where could/should I raise an issue to look at removing spam? Thanks

Rufus Pollock
@rufuspollock
@StephenAbbott flag here is perfect or on https://github.com/datahq/datahub-qa/issues
Rufus Pollock
@rufuspollock
@StephenAbbott did you manage to flag this?
Rufus Pollock
@rufuspollock
blob
Rufus Pollock
@rufuspollock

:newspaper: "Awesome" page renamed to collections and made beautiful

See screenshot above and visit the page:

https://datahub.io/collections

Zane Selvans
@zaneselvans
Do folks have a favorite easy to use package for visualizing and filtering data that's accessible via data packages? Something that a relative layperson could use?
Zane Selvans
@zaneselvans
Is there a recommended maximum file size for use with tabular data resources? When running data validate I get a warning about a memory leak. On a 30MB resource, it seems to work fine. On a 160MB resource eventually I get a core dump with JavaScript running out of memory on a machine with 24GB of RAM. For larger data packages does it make more sense to use the python goodtables for validation instead of the Node.js based command line tool?
Rufus Pollock
@rufuspollock

Do folks have a favorite easy to use package for visualizing and filtering data that's accessible via data packages? Something that a relative layperson could use?

The perfect thing would be something that already ingests tabular but is made Data Package aware. Right now you can fallback to anything that can ingest csv (which is pretty much all tools). I can suggest some tools for playing with data that would suit (and we could think about how to plugin Data Package support as we have with e.g. pandas etc.

Is there a recommended maximum file size for use with tabular data resources? When running data validate I get a warning about a memory leak. On a 30MB resource, it seems to work fine. On a 160MB resource eventually I get a core dump with JavaScript running out of memory on a machine with 24GB of RAM. For larger data packages does it make more sense to use the python goodtables for validation instead of the Node.js based command line tool?

No there is no limit for tabular data packages. This is a bug with data validate - can you open an issue on https://github.com/datahq/data-cli

I think you can use either route and for bigger packages goodtables may be better (and is used internally).

My other question here is whether any of the files can be chunked/partitioned - frictionlessdata/specs#620

M. Ali Naqvi
@MAliNaqvi

Hi folks,

I wanted to updated our datasets on datahub.io/johnsnowlabs

When pushing the dataset this is what I got:

> Error! Max storage for user exceeded plan limit (5000MB)

However the total size of the data that has been uploaded is ~200MB

Rufus Pollock
@rufuspollock
@MAliNaqvi we'll need to fix that - probably the total size of the other datasets exceeds 5GB
Rufus Pollock
@rufuspollock
@MAliNaqvi fixed
M. Ali Naqvi
@MAliNaqvi
@rufuspollock The problem is still there. Is there any further information I can provide?
Irakli Mchedlishvili
@zelima
This message was deleted
Irakli Mchedlishvili
@zelima
@MAliNaqvi is it still saying 5000?
Irakli Mchedlishvili
@zelima
@MAliNaqvi never mind, found the problem
Rufus Pollock
@rufuspollock
@zelima is it fixed?
Irakli Mchedlishvili
@zelima
@rufuspollock I've sent instructions to fix this privately.
@MAliNaqvi can you confirm it's fixed now?
M. Ali Naqvi
@MAliNaqvi
Not yet. Still need some support around using the updated config.
M. Ali Naqvi
@MAliNaqvi
@zelima sent you a private message
M. Ali Naqvi
@MAliNaqvi
@zelima the issue has been resolved. Thank you!
M. Ali Naqvi
@MAliNaqvi
Folks how do I list datasets that are available on an account?
Irakli Mchedlishvili
@zelima
Datasets should be available on https://datahub.io/{username}
M. Ali Naqvi
@MAliNaqvi
The whole list all at once?
Is there an api or a function in data utility?
M. Ali Naqvi
@MAliNaqvi

At the moment I am scraping the list of pages, but it would be great to somehow get an exhaustive list of what has been uploaded. My use case is that we have a list of 217 datasets that we want uploaded. However, only 197 were uploaded. How do we identify the ones there weren't processed?

I went through the data utility logs which seemed to have uploaded everything.

Irakli Mchedlishvili
@zelima
@MAliNaqvi you can use our API to list the datasets here's an example https://api.datahub.io/metastore/search?datahub.owner="core"&size=100&from=0
Where:
  • datahub.owner is owner_id, not username
  • Size is max items per page (Max is 100)
  • From is the page number starting from 0
M. Ali Naqvi
@MAliNaqvi
Nice!
Irakli Mchedlishvili
@zelima
Note: " quotes mast be included around onwerID as in example
You can also send the JWT token in headers Auth-Token=your_jwt or as query parameter &jwt=your_jwt to list the unlisted/private datasets as well
Irakli Mchedlishvili
@zelima
Here you can find documentation for API https://github.com/datahq/metastore#api
M. Ali Naqvi
@MAliNaqvi
Thank you @zelima
Folks, I just spent a couple of hours uploading 43 datasets. It was a very frustrating to find that only 3 of those datasets made it to the datahub website, even though the data utility uploaded everything without an issue. Here are the results:
M. Ali Naqvi
@MAliNaqvi
| dataset                                                            | url                                                                                                    | AVAILABLE |
|--------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------+-----------|
| nasa-temperature-anomalies-by-latitude-bands-time-series-1880-2017 | https://datahub.io/JohnSnowLabs/nasa-temperature-anomalies-by-latitude-bands-time-series-1880-2017/v/2 | No        |
| chicago-annual-taxpayer-location-list                              | https://datahub.io/JohnSnowLabs/chicago-annual-taxpayer-location-list/v/2                              | No        |
| nasa-global-temperature-anomalies-time-series-1880-2018            | https://datahub.io/JohnSnowLabs/nasa-global-temperature-anomalies-time-series-1880-2018/v/2            | No        |
| nj-residents-leading-causes-of-death                               | https://datahub.io/JohnSnowLabs/nj-residents-leading-causes-of-death/v/2                               | No        |
| uk-properties-for-sale-by-ministry-of-defense                      | https://datahub.io/JohnSnowLabs/uk-properties-for-sale-by-ministry-of-defense/v/2                      | No        |
| tree-debris-requested-by-311-service                               | https://datahub.io/JohnSnowLabs/tree-debris-requested-by-311-service/v/2                               | No        |
| tree-trims-requested-by-311-service                                | https://datahub.io/JohnSnowLabs/tree-trims-requested-by-311-service/v/2                                | No        |
| garbage-carts-requested-by-311-service                             | https://datahub.io/JohnSnowLabs/garbage-carts-requested-by-311-service/v/2                             | No        |
| pot-holes-reported-by-311-service                                  | https://datahub.io/JohnSnowLabs/pot-holes-reported-by-311-service/v/2                                  | No        |
| eicu-collaborative-research-admissions-summary-statistics          | https://datahub.io/JohnSnowLabs/eicu-collaborative-research-admissions-summary-statistics/v/1          | Yes       |
| chicago-taxi-trips                                                 | https://datahub.io/JohnSnowLabs/chicago-taxi-trips/v/2                                                 | No        |
| chicago-beach-weather-stations-automated-sensors                   | https://datahub.io/JohnSnowLabs/chicago-beach-weather-stations-automated-sensors/v/2                   | No        |
| chicago-beach-water-quality-automated-sensors-report               | https://datahub.io/JohnSnowLabs/chicago-beach-water-quality-automated-sensors-report/v/2               | No        |
| all-countries-latitude-longitude                                   | https://datahub.io/JohnSnowLabs/all-countries-latitude-longitude/v/4                                   | No        |
| estimates-emissions-of-co2-at-country-and-global-level             | https://datahub.io/JohnSnowLabs/estimates-emissions-of-co2-at-country-and-global-level/v/2             | No        |
| energy-consumption-by-mode-of-transportation-and-type-of-energy    | https://datahub.io/JohnSnowLabs/energy-consumption-by-mode-of-transportation-and-type-of-energy/v/2    | No        |
| relocated-vehicles-in-chicago-last-90-days                         | https://datahub.io/JohnSnowLabs/relocated-vehicles-in-chicago-last-90-days/v/1                         | No        |
| nys-english-and-mathematics-exam                                   | https://datahub.io/JohnSnowLabs/nys-english-and-mathematics-exam/v/2                                   | No        |
| schools-for-life-safety-evaluations                                | https://datahub.io/JohnSnowLabs/schools-for-life-safety-evaluations/v/2                                | No        |
| food-affordability-for-households-led-by-females                   | https://datahub.io/JohnSnowLabs/food-affordability-for-households-led-by-females/v/2                   | No        |
| chicago-business-licenses                                          | https://datahub.io/JohnSnowLabs/chicago-business-licenses/v/1                                          | No        |
| city-population-annual-time-series                                 | https://datahub.io/JohnSnowLabs/city-population-annual-time-series/v/3                                 | No        |
| bloomington-animal-care-and-control-adopted-animals                | https://datahub.io/JohnSnowLabs/bloomington-animal-care-and-control-adopted-animals/v/2                | No        |
| legally-operating-businesses                                       | https://datahub.io/JohnSnowLabs/legally-operating-businesses/v/2                                       | No        |
| cta-ridership-bus-routes                                           | https://datahub.io/JohnSnowLabs/cta-ridership-bus-routes/v/1                                           | Yes       |
| most-popular-baby-names-by-gender-and-mother-ethnic-group          | https://datahub.io/JohnSnowLabs/most-popular-baby-names-by-gender-and-mother-ethnic-group/v/2          | No        |
| eicu-collaborative-research-available-tables-and-data              | https://datahub.io/JohnSnowLabs/eicu-collaborative-research-available-tables-and-data/v/1              | Yes       |
| nj-traffic-counts-data                                             | https://datahub.io/JohnSnowLabs/nj-traffic-counts-data/v/2                                             | No        |
| austin-adult-and-children-vaccinations                             | https://datahub.io/JohnSnowLabs/austin-adult-and-children-vaccinations/v/2                             | No        |
| euro-4-cars-emissions-traded-on-uk-market-2000-2012                | https://datahub.io/JohnSnowLabs/euro-4-cars-emissions-traded-on-uk-market-2000-2012/v/2                | No        |
| lobbyist-agency-report                                             | https://datahub.io/JohnSnowLabs/lobbyist-agency-report/v/2                                             | No        |
| windsor-transit-bus-stops                                          | https://datahub.io/JohnSnowLabs/windsor-transit-bus-stops/v/2                                          | No        |
| omha-receipts-for-fiscal-year-2011-2013                            | https://datahub.io/JohnSnowLabs/omha-receipts-for-fiscal-year-2011-2013/v/2                            | No        |
| impaired-driving-death-rate-by-age-and-race                        | https://datahub.io/JohnSnowLabs/impaired-driving-death-rate-by-age-and-race/v/2                        | No        |
| chicago-red-light-and-speed-camera-violations                      | https://datahub.io/JohnSnowLabs/chicago-red-light-and-speed-camera-violations/v/2                      | No        |
| us-employment-and-unemployment-rates                               | https://datahub.io/JohnSnowLabs/us-employment-and-unemployment-rates/v/2                               | No        |
| chicago-affordable-rental-housing-developments                     | https://datahub.io/JohnSnowLabs/chicago-affordable-rental-housing-developments/v/2                     | No        |
| vehicle-occupant-safety-data                                       | https://datahub.io/JohnSnowLabs/vehicle-occupant-safety-data/v/2                                       | No        |
| chicago-traffic-tracker                                            | https://datahub.io/JohnSnowLabs/chicago-traffic-tracker/v/2                                            | No        |
| imf-world-economic-outlook-database                                | https://datahub.io/JohnSnowLabs/imf-world-economic-outlook-database/v/2                                | No        |
| chicago-bike-racks-map                                             | https://datahub.io/JohnSnowLabs/chicago-bike-racks-map/v/2                                             | No        |
| us-states-and-territories                                          | https://datahub.io/JohnSnowLabs/us-states-and-territories/v/2                                          | No        |
| chicago-alternative-fuel-locations                                 | https://datahub.io/JohnSnowLabs/chicago-alternative-fuel-locations/v/2                                 | No        |