Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 18 14:51
    lwinfree opened #729
  • May 18 11:12
    ivbeg opened #792
  • May 17 18:29
    Rethink2050 opened #791
  • May 17 12:25
    sapetti9 edited #727
  • May 17 07:39
    roll labeled #728
  • May 16 16:31
    lwinfree assigned #728
  • May 16 16:31
    lwinfree opened #728
  • May 16 16:26
    lwinfree closed #723
  • May 16 16:26
    lwinfree commented #723
  • May 15 06:33
    roll edited #790
  • May 15 06:32
    roll opened #790
  • May 09 08:00
    roll labeled #375
  • May 09 07:59
    roll unlabeled #375
  • May 05 17:10
    sapetti9 edited #727
  • May 05 17:09
    sapetti9 labeled #727
  • May 05 17:09
    sapetti9 assigned #727
  • May 05 17:09
    sapetti9 opened #727
  • May 05 17:08
    sapetti9 closed #725
  • May 05 17:08
    sapetti9 edited #725
  • May 05 17:08
    sapetti9 edited #725
Kleanthis
@kleanthisk10
This package is not for use yet , it would be better to do so, in a couple of weeks. do you still want to buid it?
Heidi Seibold
@HeidiBaya_twitter
I am confused that there is a blog post on how to use it, but it is not for usage
Kleanthis
@kleanthisk10
Which blog post is this?
@HeidiBaya_twitter Sorry for that. We've started a rebuild of the R lib and haven't updated the blog post yet.
jobarratt
@jobarratt
The blog post is still relevant but will relate to the RopenSci library
Heidi Seibold
@HeidiBaya_twitter
Ok, thanks. Are the people at ROpenSci aware that you are rewriting the package?
Serah Njambi Rono
@serahrono
@HeidiBaya_twitter yes they are.
Heidi Seibold
@HeidiBaya_twitter
:thumbsup:
Do you have a data package that would be interesting for machine learning, i.e. a data set whith a clearly defined variable of interest that should be predicted?
So I can use it as an example?
Serah Njambi Rono
@serahrono
@HeidiBaya_twitter check here https://github.com/datasets
Rufus Pollock
@rufuspollock
@callmealien can you please link to https://datahub.io/core - that is the official location for core datasets (i.e. the data at github.com/datasets all shows up on datahub.io/core - plus datahub.io are now more up to date) :-)
Serah Njambi Rono
@serahrono
@rufuspollock :+1:
Heidi Seibold
@HeidiBaya_twitter
Cool, thanks
So I tried one example and directly got an error. Not sure if it's a problem of the R package or the data package:
# Load client
library(datapkg)

# Get Data Package
datapackage <- datapkg_read("https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest")
#> Reading file https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv.csv
#> Reading file https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv_preview/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv_preview.json
#> Warning: Unnamed `col_types` should have the same length as `col_names`.
#> Using smaller of the two.
#> Error in nchar(x): invalid multibyte string, element 2
Rufus Pollock
@rufuspollock

@HeidiBaya_twitter you've just found a but we're fixing right now in those instructions ...

Can you do

datapackage <- datapkg_read("https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/datapackage.json")

and see if that owrks

Heidi Seibold
@HeidiBaya_twitter
library(datapkg)
datapackage <- datapkg_read("https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/datapackage.json")
#> Reading file https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv.csv
#> Reading file https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv_preview/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv_preview.json
#> Warning: Unnamed `col_types` should have the same length as `col_names`.
#> Using smaller of the two.
#> Error in nchar(x): invalid multibyte string, element 2
Same error
Rufus Pollock
@rufuspollock

There is something buggy in the library somewehre - it is trying to read a file that does not exist - as you can see it is repeating the url in the file path (i suspect it is not supportive of dp v1 specs in the path field ...)

I suggest you could load the CSV directly for now in R (which i know somewhat defeats the point) ;-)

e.g.

https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv.csv

Or any other links directly from the https://datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013

Heidi Seibold
@HeidiBaya_twitter
Thanks @rufuspollock for the super quick help
Rufus Pollock
@rufuspollock

@HeidiBaya_twitter :-) - it is so great to have you using the data - please keep the feedback (and bugs) coming so we can fix them. We should have R instructions updated on site this week and hope we will have a usable R lib soon.

In the mean time you could almost just do yourself:

DIY

Just open the datapackage.json link:

https://datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/datapackage.json (will redirect for you ...)

Open that and look at the resources section and you'll have all the CSV files with schemas etc

Common pattern:

https://datahub.io/{owner}/{dataset-name}/datapackage.json

DataHub.io chat channel

We also have a dedicated chat channel just for datahub.io at http://gitter.im/datahubio/chat

Heidi Seibold
@HeidiBaya_twitter
Cool, thanks. I will keep trying ;)
Rufus Pollock
@rufuspollock
@roll what is the API spec for the R development and @kleanthisk10 what is the current status - can i track this somewhere. We need this so we can update the instructions in datahub.io for R users :-)
roll
@roll
@rufuspollock As for the all implementations the spec is a high-level requirements from http://specs.frictionlessdata.io/implementation/ and recommendation to be as much as possible close to implementation reference - https://github.com/frictionlessdata/implementations#implementation (at least for naming)
Kleanthis
@kleanthisk10
@rufuspollock the implementation of R libraries are here: https://github.com/okgreece/tableschema-r and https://github.com/okgreece/datapackage-r the last to dos of tableschema can be found here okgreece/tableschema-r#1 and for datapackage will be soon available. Of course they will be pulled to the repo of frictionless data, when they'll be ready!
strets123
@strets123
Hi, Andy Stretton from Zegami here, I am interested in improving the talk I am doing about datapackage pipelines. What I would like to know is whether there is a preferred style for adding images to datapackages.
Rufus Pollock
@rufuspollock
@strets123 what kind of images? Graphs of stuff or just simple pngs / jpgs?
strets123
@strets123
Would be simple jpegs - each data record could have one or more of them. I am creating datapackage pipelines to download data from various open data APIs and there are images which I want to do local processing on e.g. tensorflow etc.
I also have some other use case like an image which is then cut into a set of sub-images and a measurement made on each, as in recognising cells on a microscope slide.
Rufus Pollock
@rufuspollock
@strets123 so you can images as your resources (or tar/gz them together). If you want to associate an image to a resource or a data package you can use the image tag http://specs.frictionlessdata.io/data-package/#image
strets123
@strets123
Is there a tag in tableschema to say that a column refers to an image?
Rufus Pollock
@rufuspollock
No, not atm
strets123
@strets123
OK, thanks for the info
Rufus Pollock
@rufuspollock
But you could support it yourself - i will tell you later how
strets123
@strets123
Thanks
I was thinking of borrowing some attributes from http://iiif.io/
strets123
@strets123
but would be nice to know what the most minimal approach is
Oleg Lavrovsky
@loleg
@jobarratt @pwalsh thanks for the chat today! Here are two projects I mentioned http://github.com/datalets/dribdat & https://github.com/schoolofdata-ch/datacentral
roll
@roll
@loleg welcome to the spec implementers team!
Oleg Lavrovsky
@loleg
thanks a lot @roll
Rufus Pollock
@rufuspollock

@jobarratt @pwalsh thanks for the chat today! Here are two projects I mentioned http://github.com/datalets/dribdat & https://github.com/schoolofdata-ch/datacentral

dribdat looks cool :sparkles:

And great to see you here @loleg

Oleg Lavrovsky
@loleg
I've been lurking for a while @rufuspollock & thanks :)
Rufus Pollock
@rufuspollock
Yes, see you actively here :-)
jobarratt
@jobarratt
great to have you here @loleg and delighted that you are going to be working with us on implementations of the libraries in Julia
Oleg Lavrovsky
@loleg
Indeed, looking forward to get in gear. But first, we have a hackathon to run - and that means making more data packages! https://discuss.okfn.org/t/october-27-open-tourism-data-hackathon/5860
Rufus Pollock
@rufuspollock

Indeed, looking forward to get in gear. But first, we have a hackathon to run - and that means making more data packages! https://discuss.okfn.org/t/october-27-open-tourism-data-hackathon/5860

@loleg that's great and you can start pushing your data packages to the new https://datahub.io/ - if you are interested in being an alpha publisher user just sign up and then fill in the short questionnaire ...