Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 19:27
    rufuspollock commented #241
  • Dec 14 2018 15:27
    StephenAbbott opened #246
  • Dec 03 2018 09:12
    rufuspollock commented #245
  • Nov 26 2018 14:51
    StephenAbbott opened #245
  • Nov 08 2018 08:31
    zelima commented #243
  • Nov 08 2018 08:05
    zelima closed #244
  • Nov 08 2018 08:05
    zelima commented #244
  • Nov 08 2018 07:57
    zaneselvans commented #244
  • Nov 07 2018 07:22
    zelima commented #244
  • Nov 07 2018 07:16
    akariv commented #244
  • Nov 07 2018 07:10
    akariv commented #234
  • Nov 06 2018 16:56
    parrottsquawk commented #234
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:25
    zelima commented #244
  • Nov 01 2018 13:23
    zelima commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Nov 01 2018 08:29
    anuveyatsu commented #244
  • Oct 24 2018 19:03
    zaneselvans opened #244
  • Oct 23 2018 09:40
    geraldb starred datahq/datahub-qa
  • Oct 19 2018 08:22

    Branko-Dj on master

    [travis][s]: Added update comma… (compare)

Rufus Pollock
@rufuspollock

Is there a recommended maximum file size for use with tabular data resources? When running data validate I get a warning about a memory leak. On a 30MB resource, it seems to work fine. On a 160MB resource eventually I get a core dump with JavaScript running out of memory on a machine with 24GB of RAM. For larger data packages does it make more sense to use the python goodtables for validation instead of the Node.js based command line tool?

No there is no limit for tabular data packages. This is a bug with data validate - can you open an issue on https://github.com/datahq/data-cli

I think you can use either route and for bigger packages goodtables may be better (and is used internally).

My other question here is whether any of the files can be chunked/partitioned - frictionlessdata/specs#620

M. Ali Naqvi
@MAliNaqvi

Hi folks,

I wanted to updated our datasets on datahub.io/johnsnowlabs

When pushing the dataset this is what I got:

> Error! Max storage for user exceeded plan limit (5000MB)

However the total size of the data that has been uploaded is ~200MB

Rufus Pollock
@rufuspollock
@MAliNaqvi we'll need to fix that - probably the total size of the other datasets exceeds 5GB
Rufus Pollock
@rufuspollock
@MAliNaqvi fixed
M. Ali Naqvi
@MAliNaqvi
@rufuspollock The problem is still there. Is there any further information I can provide?
Irakli Mchedlishvili
@zelima
This message was deleted
Irakli Mchedlishvili
@zelima
@MAliNaqvi is it still saying 5000?
Irakli Mchedlishvili
@zelima
@MAliNaqvi never mind, found the problem
Rufus Pollock
@rufuspollock
@zelima is it fixed?
Irakli Mchedlishvili
@zelima
@rufuspollock I've sent instructions to fix this privately.
@MAliNaqvi can you confirm it's fixed now?
M. Ali Naqvi
@MAliNaqvi
Not yet. Still need some support around using the updated config.
M. Ali Naqvi
@MAliNaqvi
@zelima sent you a private message
M. Ali Naqvi
@MAliNaqvi
@zelima the issue has been resolved. Thank you!
M. Ali Naqvi
@MAliNaqvi
Folks how do I list datasets that are available on an account?
Irakli Mchedlishvili
@zelima
Datasets should be available on https://datahub.io/{username}
M. Ali Naqvi
@MAliNaqvi
The whole list all at once?
Is there an api or a function in data utility?
M. Ali Naqvi
@MAliNaqvi

At the moment I am scraping the list of pages, but it would be great to somehow get an exhaustive list of what has been uploaded. My use case is that we have a list of 217 datasets that we want uploaded. However, only 197 were uploaded. How do we identify the ones there weren't processed?

I went through the data utility logs which seemed to have uploaded everything.

Irakli Mchedlishvili
@zelima
@MAliNaqvi you can use our API to list the datasets here's an example https://api.datahub.io/metastore/search?datahub.owner="core"&size=100&from=0
Where:
  • datahub.owner is owner_id, not username
  • Size is max items per page (Max is 100)
  • From is the page number starting from 0
M. Ali Naqvi
@MAliNaqvi
Nice!
Irakli Mchedlishvili
@zelima
Note: " quotes mast be included around onwerID as in example
You can also send the JWT token in headers Auth-Token=your_jwt or as query parameter &jwt=your_jwt to list the unlisted/private datasets as well
Irakli Mchedlishvili
@zelima
Here you can find documentation for API https://github.com/datahq/metastore#api
M. Ali Naqvi
@MAliNaqvi
Thank you @zelima
Folks, I just spent a couple of hours uploading 43 datasets. It was a very frustrating to find that only 3 of those datasets made it to the datahub website, even though the data utility uploaded everything without an issue. Here are the results:
M. Ali Naqvi
@MAliNaqvi
| dataset                                                            | url                                                                                                    | AVAILABLE |
|--------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------+-----------|
| nasa-temperature-anomalies-by-latitude-bands-time-series-1880-2017 | https://datahub.io/JohnSnowLabs/nasa-temperature-anomalies-by-latitude-bands-time-series-1880-2017/v/2 | No        |
| chicago-annual-taxpayer-location-list                              | https://datahub.io/JohnSnowLabs/chicago-annual-taxpayer-location-list/v/2                              | No        |
| nasa-global-temperature-anomalies-time-series-1880-2018            | https://datahub.io/JohnSnowLabs/nasa-global-temperature-anomalies-time-series-1880-2018/v/2            | No        |
| nj-residents-leading-causes-of-death                               | https://datahub.io/JohnSnowLabs/nj-residents-leading-causes-of-death/v/2                               | No        |
| uk-properties-for-sale-by-ministry-of-defense                      | https://datahub.io/JohnSnowLabs/uk-properties-for-sale-by-ministry-of-defense/v/2                      | No        |
| tree-debris-requested-by-311-service                               | https://datahub.io/JohnSnowLabs/tree-debris-requested-by-311-service/v/2                               | No        |
| tree-trims-requested-by-311-service                                | https://datahub.io/JohnSnowLabs/tree-trims-requested-by-311-service/v/2                                | No        |
| garbage-carts-requested-by-311-service                             | https://datahub.io/JohnSnowLabs/garbage-carts-requested-by-311-service/v/2                             | No        |
| pot-holes-reported-by-311-service                                  | https://datahub.io/JohnSnowLabs/pot-holes-reported-by-311-service/v/2                                  | No        |
| eicu-collaborative-research-admissions-summary-statistics          | https://datahub.io/JohnSnowLabs/eicu-collaborative-research-admissions-summary-statistics/v/1          | Yes       |
| chicago-taxi-trips                                                 | https://datahub.io/JohnSnowLabs/chicago-taxi-trips/v/2                                                 | No        |
| chicago-beach-weather-stations-automated-sensors                   | https://datahub.io/JohnSnowLabs/chicago-beach-weather-stations-automated-sensors/v/2                   | No        |
| chicago-beach-water-quality-automated-sensors-report               | https://datahub.io/JohnSnowLabs/chicago-beach-water-quality-automated-sensors-report/v/2               | No        |
| all-countries-latitude-longitude                                   | https://datahub.io/JohnSnowLabs/all-countries-latitude-longitude/v/4                                   | No        |
| estimates-emissions-of-co2-at-country-and-global-level             | https://datahub.io/JohnSnowLabs/estimates-emissions-of-co2-at-country-and-global-level/v/2             | No        |
| energy-consumption-by-mode-of-transportation-and-type-of-energy    | https://datahub.io/JohnSnowLabs/energy-consumption-by-mode-of-transportation-and-type-of-energy/v/2    | No        |
| relocated-vehicles-in-chicago-last-90-days                         | https://datahub.io/JohnSnowLabs/relocated-vehicles-in-chicago-last-90-days/v/1                         | No        |
| nys-english-and-mathematics-exam                                   | https://datahub.io/JohnSnowLabs/nys-english-and-mathematics-exam/v/2                                   | No        |
| schools-for-life-safety-evaluations                                | https://datahub.io/JohnSnowLabs/schools-for-life-safety-evaluations/v/2                                | No        |
| food-affordability-for-households-led-by-females                   | https://datahub.io/JohnSnowLabs/food-affordability-for-households-led-by-females/v/2                   | No        |
| chicago-business-licenses                                          | https://datahub.io/JohnSnowLabs/chicago-business-licenses/v/1                                          | No        |
| city-population-annual-time-series                                 | https://datahub.io/JohnSnowLabs/city-population-annual-time-series/v/3                                 | No        |
| bloomington-animal-care-and-control-adopted-animals                | https://datahub.io/JohnSnowLabs/bloomington-animal-care-and-control-adopted-animals/v/2                | No        |
| legally-operating-businesses                                       | https://datahub.io/JohnSnowLabs/legally-operating-businesses/v/2                                       | No        |
| cta-ridership-bus-routes                                           | https://datahub.io/JohnSnowLabs/cta-ridership-bus-routes/v/1                                           | Yes       |
| most-popular-baby-names-by-gender-and-mother-ethnic-group          | https://datahub.io/JohnSnowLabs/most-popular-baby-names-by-gender-and-mother-ethnic-group/v/2          | No        |
| eicu-collaborative-research-available-tables-and-data              | https://datahub.io/JohnSnowLabs/eicu-collaborative-research-available-tables-and-data/v/1              | Yes       |
| nj-traffic-counts-data                                             | https://datahub.io/JohnSnowLabs/nj-traffic-counts-data/v/2                                             | No        |
| austin-adult-and-children-vaccinations                             | https://datahub.io/JohnSnowLabs/austin-adult-and-children-vaccinations/v/2                             | No        |
| euro-4-cars-emissions-traded-on-uk-market-2000-2012                | https://datahub.io/JohnSnowLabs/euro-4-cars-emissions-traded-on-uk-market-2000-2012/v/2                | No        |
| lobbyist-agency-report                                             | https://datahub.io/JohnSnowLabs/lobbyist-agency-report/v/2                                             | No        |
| windsor-transit-bus-stops                                          | https://datahub.io/JohnSnowLabs/windsor-transit-bus-stops/v/2                                          | No        |
| omha-receipts-for-fiscal-year-2011-2013                            | https://datahub.io/JohnSnowLabs/omha-receipts-for-fiscal-year-2011-2013/v/2                            | No        |
| impaired-driving-death-rate-by-age-and-race                        | https://datahub.io/JohnSnowLabs/impaired-driving-death-rate-by-age-and-race/v/2                        | No        |
| chicago-red-light-and-speed-camera-violations                      | https://datahub.io/JohnSnowLabs/chicago-red-light-and-speed-camera-violations/v/2                      | No        |
| us-employment-and-unemployment-rates                               | https://datahub.io/JohnSnowLabs/us-employment-and-unemployment-rates/v/2                               | No        |
| chicago-affordable-rental-housing-developments                     | https://datahub.io/JohnSnowLabs/chicago-affordable-rental-housing-developments/v/2                     | No        |
| vehicle-occupant-safety-data                                       | https://datahub.io/JohnSnowLabs/vehicle-occupant-safety-data/v/2                                       | No        |
| chicago-traffic-tracker                                            | https://datahub.io/JohnSnowLabs/chicago-traffic-tracker/v/2                                            | No        |
| imf-world-economic-outlook-database                                | https://datahub.io/JohnSnowLabs/imf-world-economic-outlook-database/v/2                                | No        |
| chicago-bike-racks-map                                             | https://datahub.io/JohnSnowLabs/chicago-bike-racks-map/v/2                                             | No        |
| us-states-and-territories                                          | https://datahub.io/JohnSnowLabs/us-states-and-territories/v/2                                          | No        |
| chicago-alternative-fuel-locations                                 | https://datahub.io/JohnSnowLabs/chicago-alternative-fuel-locations/v/2                                 | No        |
Anuar Ustayev
@anuveyatsu

Folks, I just spent a couple of hours uploading 43 datasets. It was a very frustrating to find that only 3 of those datasets made it to the datahub website, even though the data utility uploaded everything without an issue. Here are the results:

@MAliNaqvi Hi Ali! As I can see all datasets was uploaded successfully, however, most of them have validation/processing issues. You need to be logged in to see those errors. I know that you’re using an org account so the best way to check would be to pass your JWT within query params, e.g., try this https://datahub.io/JohnSnowLabs/chicago-traffic-tracker/v/2?jwt=<your-jwt> so that you are able to see FAILED dataset page.

Malikah
@Malikah95
hi
I'm thinking of a data set for each disease, his different levels and his symptoms, in order to design a tool for medical diagnostic.
medical diagnostic. or for only one disease like heart and like that ?
Anuar Ustayev
@anuveyatsu
@Malikah95 Hi! Could you please send a request using our service here - https://datahub.io/requests ?
You could also take a look at existing datasets, e.g., some can be found here - https://datahub.io/machine-learning
Irakli Mchedlishvili
@zelima

@akariv dataflows' sort_by processor does not seem to be working as expected, any ideas?

from dataflows import Flow, printer, sort_rows

data = [
    {'data': 'B'},
    {'data': 'E'},
    {'data': 'C'},
    {'data': 'D'},
    {'data': 'A'},
]

f = Flow(
      data,
      sort_rows('data'),
      printer()
)
f.process()

results with

res_1:
  #  data
     (string)
---  ----------
  1  B
  2  E
  3  C
  4  D
  5  A
hm... OK, so looking at the docs here seems like I need {} around data https://github.com/frictionlessdata/datapackage-pipelines#sort
from dataflows import Flow, printer, sort_rows

data = [
    {'data': 'B'},
    {'data': 'E'},
    {'data': 'C'},
    {'data': 'D'},
    {'data': 'A'},
]

f = Flow(
      data,
      sort_rows('{data}'),
      printer()
)
f.process()
Irakli Mchedlishvili
@zelima
Great, now I've got
res_1:
  #  data
     (string)
---  ----------
  1  A
  2  B
  3  C
  4  D
  5  E
Shreenikhil.m.c
@shreenikhil_twitter
I would like to know to know the original source of this dataset how do I know it
and the name of the dataset is 2016 Survey of Consumer Finances (Summary Extract) and the link is
Irakli Mchedlishvili
@zelima
@shreenikhil_twitter hi, unfortunately owner of the dataset has not provided source of the dataset in non of metadata of it or README. I would try and @ owner and wait for him/her to respond.
Usually usernames on DataHub and GitHub match as people register with their GitHub accounts
Konrad Höffner
@KonradHoeffner
Hello, I would like to apply for an organization account.
Anuar Ustayev
@anuveyatsu
Hi @KonradHoeffner could you please fill in this form - https://datahub.io/docs/features/teams-and-permissions
Konrad Höffner
@KonradHoeffner
Will do, thanks for the extremely quick response! :-)
Anuar Ustayev
@anuveyatsu
You’re welcome :smile:
Michael Decklever
@MDeck06_twitter
Hello. I found a wonderful dataset of S&P Financial data. Would you happen to have this same dataset over the past 10 years archived?
S&P 500 Companies with Financial Information
Mostafa Senousy
@Senousy_gitlab
I need datasets for Eczema disease ( images processing)
Is there any help
all my appreciated for your cooperation