Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 16:09
    sapetti9 edited #709
  • 16:09
    sapetti9 edited #709
  • 13:59
    peterdesmet synchronize #766
  • 13:56
    peterdesmet synchronize #766
  • 13:54
    peterdesmet opened #766
  • Jan 26 13:45
    sapetti9 edited #709
  • Jan 26 13:45
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 25 15:30

    github-actions[bot] on gh-pages

    Deploying to gh-pages from @ 3… (compare)

  • Jan 25 15:25

    rufuspollock on master

    Enforce that `license` has `nam… Merge pull request #765 from pe… (compare)

  • Jan 25 15:25
    rufuspollock closed #765
  • Jan 25 14:59
    peterdesmet opened #765
  • Jan 17 14:45
    peterdesmet closed #764
  • Jan 17 13:15
    peterdesmet edited #764
  • Jan 17 11:46
    peterdesmet edited #764
  • Jan 17 09:34
    peterdesmet opened #764
Heidi Seibold
@HeidiBaya_twitter
So I can use it as an example?
Serah Njambi Rono
@serahrono
@HeidiBaya_twitter check here https://github.com/datasets
Rufus Pollock
@rufuspollock
@callmealien can you please link to https://datahub.io/core - that is the official location for core datasets (i.e. the data at github.com/datasets all shows up on datahub.io/core - plus datahub.io are now more up to date) :-)
Serah Njambi Rono
@serahrono
@rufuspollock :+1:
Heidi Seibold
@HeidiBaya_twitter
Cool, thanks
So I tried one example and directly got an error. Not sure if it's a problem of the R package or the data package:
# Load client
library(datapkg)

# Get Data Package
datapackage <- datapkg_read("https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest")
#> Reading file https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv.csv
#> Reading file https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv_preview/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv_preview.json
#> Warning: Unnamed `col_types` should have the same length as `col_names`.
#> Using smaller of the two.
#> Error in nchar(x): invalid multibyte string, element 2
Rufus Pollock
@rufuspollock

@HeidiBaya_twitter you've just found a but we're fixing right now in those instructions ...

Can you do

datapackage <- datapkg_read("https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/datapackage.json")

and see if that owrks

Heidi Seibold
@HeidiBaya_twitter
library(datapkg)
datapackage <- datapkg_read("https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/datapackage.json")
#> Reading file https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv.csv
#> Reading file https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/latest/https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv_preview/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv_preview.json
#> Warning: Unnamed `col_types` should have the same length as `col_names`.
#> Using smaller of the two.
#> Error in nchar(x): invalid multibyte string, element 2
Same error
Rufus Pollock
@rufuspollock

There is something buggy in the library somewehre - it is trying to read a file that does not exist - as you can see it is repeating the url in the file path (i suspect it is not supportive of dp v1 specs in the path field ...)

I suggest you could load the CSV directly for now in R (which i know somewhat defeats the point) ;-)

e.g.

https://pkgstore.datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013:diagnosed-diabetes-prevalence-2004-2013-csv_csv/data/diagnosed-diabetes-prevalence-2004-2013-csv_csv.csv

Or any other links directly from the https://datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013

Heidi Seibold
@HeidiBaya_twitter
Thanks @rufuspollock for the super quick help
Rufus Pollock
@rufuspollock

@HeidiBaya_twitter :-) - it is so great to have you using the data - please keep the feedback (and bugs) coming so we can fix them. We should have R instructions updated on site this week and hope we will have a usable R lib soon.

In the mean time you could almost just do yourself:

DIY

Just open the datapackage.json link:

https://datahub.io/JohnSnowLabs/diagnosed-diabetes-prevalence-2004-2013/datapackage.json (will redirect for you ...)

Open that and look at the resources section and you'll have all the CSV files with schemas etc

Common pattern:

https://datahub.io/{owner}/{dataset-name}/datapackage.json

DataHub.io chat channel

We also have a dedicated chat channel just for datahub.io at http://gitter.im/datahubio/chat

Heidi Seibold
@HeidiBaya_twitter
Cool, thanks. I will keep trying ;)
Rufus Pollock
@rufuspollock
@roll what is the API spec for the R development and @kleanthisk10 what is the current status - can i track this somewhere. We need this so we can update the instructions in datahub.io for R users :-)
roll
@roll
@rufuspollock As for the all implementations the spec is a high-level requirements from http://specs.frictionlessdata.io/implementation/ and recommendation to be as much as possible close to implementation reference - https://github.com/frictionlessdata/implementations#implementation (at least for naming)
Kleanthis
@kleanthisk10
@rufuspollock the implementation of R libraries are here: https://github.com/okgreece/tableschema-r and https://github.com/okgreece/datapackage-r the last to dos of tableschema can be found here okgreece/tableschema-r#1 and for datapackage will be soon available. Of course they will be pulled to the repo of frictionless data, when they'll be ready!
strets123
@strets123
Hi, Andy Stretton from Zegami here, I am interested in improving the talk I am doing about datapackage pipelines. What I would like to know is whether there is a preferred style for adding images to datapackages.
Rufus Pollock
@rufuspollock
@strets123 what kind of images? Graphs of stuff or just simple pngs / jpgs?
strets123
@strets123
Would be simple jpegs - each data record could have one or more of them. I am creating datapackage pipelines to download data from various open data APIs and there are images which I want to do local processing on e.g. tensorflow etc.
I also have some other use case like an image which is then cut into a set of sub-images and a measurement made on each, as in recognising cells on a microscope slide.
Rufus Pollock
@rufuspollock
@strets123 so you can images as your resources (or tar/gz them together). If you want to associate an image to a resource or a data package you can use the image tag http://specs.frictionlessdata.io/data-package/#image
strets123
@strets123
Is there a tag in tableschema to say that a column refers to an image?
Rufus Pollock
@rufuspollock
No, not atm
strets123
@strets123
OK, thanks for the info
Rufus Pollock
@rufuspollock
But you could support it yourself - i will tell you later how
strets123
@strets123
Thanks
I was thinking of borrowing some attributes from http://iiif.io/
strets123
@strets123
but would be nice to know what the most minimal approach is
Oleg Lavrovsky
@loleg
@jobarratt @pwalsh thanks for the chat today! Here are two projects I mentioned http://github.com/datalets/dribdat & https://github.com/schoolofdata-ch/datacentral
roll
@roll
@loleg welcome to the spec implementers team!
Oleg Lavrovsky
@loleg
thanks a lot @roll
Rufus Pollock
@rufuspollock

@jobarratt @pwalsh thanks for the chat today! Here are two projects I mentioned http://github.com/datalets/dribdat & https://github.com/schoolofdata-ch/datacentral

dribdat looks cool :sparkles:

And great to see you here @loleg

Oleg Lavrovsky
@loleg
I've been lurking for a while @rufuspollock & thanks :)
Rufus Pollock
@rufuspollock
Yes, see you actively here :-)
jobarratt
@jobarratt
great to have you here @loleg and delighted that you are going to be working with us on implementations of the libraries in Julia
Oleg Lavrovsky
@loleg
Indeed, looking forward to get in gear. But first, we have a hackathon to run - and that means making more data packages! https://discuss.okfn.org/t/october-27-open-tourism-data-hackathon/5860
Rufus Pollock
@rufuspollock

Indeed, looking forward to get in gear. But first, we have a hackathon to run - and that means making more data packages! https://discuss.okfn.org/t/october-27-open-tourism-data-hackathon/5860

@loleg that's great and you can start pushing your data packages to the new https://datahub.io/ - if you are interested in being an alpha publisher user just sign up and then fill in the short questionnaire ...

jobarratt
@jobarratt
looks great @loleg ! if there is something you are working on that is FD related always let us (especially @callmealien ) know here so we can give it a bit of an extra promotion push! And you may already be in touch with the OKI comms team but if not and you want to pitch a blog we'll always be happy to support you with it
Meiran Zhiyenbayev
@Mikanebu

Data Package v1 Specifications. What has Changed and how to Upgrade

This post walks you through the major changes in the Data Package v1 specs compared to pre-v1. It covers changes in the full suite of Data Package specifications including Data Resources and Table Schema. It is particularly valuable if:

  • you were using Data Packages pre v1 and want to know how to upgrade your datasets
  • if you are implementing Data Package related tooling and want to know how to upgrade your tools or want to support or auto-upgrade pre-v1 Data Packages for backwards compatibility

You can find the entire blogpost here http://datahub.io/blog/upgrade-to-data-package-specs-v1

Stephen Gates
@Stephen-Gates
What's the difference between sources in the Data Resource spec and sources in a Data Package? Sources in the resource don't explicitly inherit from the package like licences do. So why have both?
Meiran Zhiyenbayev
@Mikanebu
@Stephen-Gates Thanks for asking this question. We will provide an answer soon, reading through the discussion in specs.
Rufus Pollock
@rufuspollock
@Mikanebu you can have sources in both and there is no specific semantic on inheritance. sources in data package can be taken as sources for whole data package whilst for a given resource they are just for that resoruces ...
Meiran Zhiyenbayev
@Mikanebu
@rufuspollock Thanks for clarifying this
Stephen Gates
@Stephen-Gates
@rufuspollock if that's the case, why not have a statement similar to licenses, licenses: as for Data Package metadata. If not specified the resource inherits from the data package.
Rufus Pollock
@rufuspollock

@Stephen-Gates because i don't think the specific resource inherits in a defined sense like licenses. sources are a less specific in that sense - whereas licenses obviously filter down the sources you specify may apply to some resources but not others etc.

I guess my question is more to you: what semantics do you want and why :-) ?

Stephen Gates
@Stephen-Gates
@rufuspollock From a convenience perspective, I think think you should be able to define a licence or sources once at the package level and explicitly say resources inherit. If sources vary at the resource level, specify at that level and don't specify at the package level. Given licence compatibility issues, you could you specify different licences at the resource and not have a licence at the package level. The Specs support this apart from explicit inheritance of sources from the package. This could be fixed in the data resource spec by source: as for Data Package metadata. If not specified the resource inherits from the data package.
Stephen Gates
@Stephen-Gates
Logged at frictionlessdata/specs#541
Byron Ruth
@bruth
Good afternoon. I am reviewing the various specifications and had two questions. First have you come across a use case where a "query" is being represented as a data resource? The assumption being that the dataset is a function of the query at the time it is executed. And second, are there any support/examples for including and/or deriving provenance (PROV or otherwise) from data resources?