Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Nov 26 09:36
    sapetti9 edited #703
  • Nov 26 09:35
    sapetti9 closed #701
  • Nov 26 09:35
    sapetti9 edited #703
  • Nov 26 09:35
    sapetti9 opened #703
  • Nov 26 09:33
    sapetti9 edited #701
  • Nov 26 09:33
    sapetti9 edited #701
  • Nov 26 09:33
    sapetti9 edited #701
  • Nov 26 09:26
    sapetti9 edited #701
  • Nov 26 09:26
    sapetti9 edited #701
  • Nov 25 07:51
    roll opened #756
  • Nov 24 15:14
    lwinfree opened #755
  • Nov 18 17:11
    sapetti9 edited #701
  • Nov 18 17:11
    sapetti9 edited #701
  • Nov 18 17:11
    sapetti9 edited #701
  • Nov 18 17:10
    sapetti9 edited #701
  • Nov 18 13:48
    sapetti9 edited #701
  • Nov 18 13:48
    sapetti9 edited #701
  • Nov 18 13:28
    sapetti9 edited #701
  • Nov 18 13:13
    sapetti9 edited #701
  • Nov 17 15:16
    sapetti9 edited #701
roll
@roll
@loleg welcome to the spec implementers team!
Oleg Lavrovsky
@loleg
thanks a lot @roll
Rufus Pollock
@rufuspollock

@jobarratt @pwalsh thanks for the chat today! Here are two projects I mentioned http://github.com/datalets/dribdat & https://github.com/schoolofdata-ch/datacentral

dribdat looks cool :sparkles:

And great to see you here @loleg

Oleg Lavrovsky
@loleg
I've been lurking for a while @rufuspollock & thanks :)
Rufus Pollock
@rufuspollock
Yes, see you actively here :-)
jobarratt
@jobarratt
great to have you here @loleg and delighted that you are going to be working with us on implementations of the libraries in Julia
Oleg Lavrovsky
@loleg
Indeed, looking forward to get in gear. But first, we have a hackathon to run - and that means making more data packages! https://discuss.okfn.org/t/october-27-open-tourism-data-hackathon/5860
Rufus Pollock
@rufuspollock

Indeed, looking forward to get in gear. But first, we have a hackathon to run - and that means making more data packages! https://discuss.okfn.org/t/october-27-open-tourism-data-hackathon/5860

@loleg that's great and you can start pushing your data packages to the new https://datahub.io/ - if you are interested in being an alpha publisher user just sign up and then fill in the short questionnaire ...

jobarratt
@jobarratt
looks great @loleg ! if there is something you are working on that is FD related always let us (especially @callmealien ) know here so we can give it a bit of an extra promotion push! And you may already be in touch with the OKI comms team but if not and you want to pitch a blog we'll always be happy to support you with it
Meiran Zhiyenbayev
@Mikanebu

Data Package v1 Specifications. What has Changed and how to Upgrade

This post walks you through the major changes in the Data Package v1 specs compared to pre-v1. It covers changes in the full suite of Data Package specifications including Data Resources and Table Schema. It is particularly valuable if:

  • you were using Data Packages pre v1 and want to know how to upgrade your datasets
  • if you are implementing Data Package related tooling and want to know how to upgrade your tools or want to support or auto-upgrade pre-v1 Data Packages for backwards compatibility

You can find the entire blogpost here http://datahub.io/blog/upgrade-to-data-package-specs-v1

Stephen Gates
@Stephen-Gates
What's the difference between sources in the Data Resource spec and sources in a Data Package? Sources in the resource don't explicitly inherit from the package like licences do. So why have both?
Meiran Zhiyenbayev
@Mikanebu
@Stephen-Gates Thanks for asking this question. We will provide an answer soon, reading through the discussion in specs.
Rufus Pollock
@rufuspollock
@Mikanebu you can have sources in both and there is no specific semantic on inheritance. sources in data package can be taken as sources for whole data package whilst for a given resource they are just for that resoruces ...
Meiran Zhiyenbayev
@Mikanebu
@rufuspollock Thanks for clarifying this
Stephen Gates
@Stephen-Gates
@rufuspollock if that's the case, why not have a statement similar to licenses, licenses: as for Data Package metadata. If not specified the resource inherits from the data package.
Rufus Pollock
@rufuspollock

@Stephen-Gates because i don't think the specific resource inherits in a defined sense like licenses. sources are a less specific in that sense - whereas licenses obviously filter down the sources you specify may apply to some resources but not others etc.

I guess my question is more to you: what semantics do you want and why :-) ?

Stephen Gates
@Stephen-Gates
@rufuspollock From a convenience perspective, I think think you should be able to define a licence or sources once at the package level and explicitly say resources inherit. If sources vary at the resource level, specify at that level and don't specify at the package level. Given licence compatibility issues, you could you specify different licences at the resource and not have a licence at the package level. The Specs support this apart from explicit inheritance of sources from the package. This could be fixed in the data resource spec by source: as for Data Package metadata. If not specified the resource inherits from the data package.
Stephen Gates
@Stephen-Gates
Logged at frictionlessdata/specs#541
Byron Ruth
@bruth
Good afternoon. I am reviewing the various specifications and had two questions. First have you come across a use case where a "query" is being represented as a data resource? The assumption being that the dataset is a function of the query at the time it is executed. And second, are there any support/examples for including and/or deriving provenance (PROV or otherwise) from data resources?
Stephen Gates
@Stephen-Gates
Hi @bruth provenance in a data package is usually provided in the readme.md. Here's a sample I'm using. Of course you could write anything in the markdown file. I'm intrigued about how you can derive provenance from a data resource. How could you determine what processing has been done by just looking at the end result?
The readme.md is a file included in the datapackage.zip http://specs.frictionlessdata.io/data-package/#illustrative-structure
Stephen Gates
@Stephen-Gates
Byron Ruth
@bruth
Thanks @Stephen-Gates. All that can be derived are changes from one version to the next (more rows, changed values, etc.). You are correct in that the intent/cause of the change is not known unless you have the context. For my use case, I will have this information since new revisions of a dataset will prompt the user (committing the new version) for a reason.
I am evaluating FD for the specs and tooling as the basis for a "data sharing platform" within my org. I have come across other specs in the past, but FD feels the most nimble and active. Extensibility is important since we may need to add additional metadata specific to my org. I appears that this is allowed within the specs.
Rufus Pollock
@rufuspollock

Good afternoon. I am reviewing the various specifications and had two questions. First have you come across a use case where a "query" is being represented as a data resource?

@bruth yes we've definitely thought about this use case. You could definitely use it this way.

And second, are there any support/examples for including and/or deriving provenance (PROV or otherwise) from data resources?

You can use the sources attribute. Would you want more than that?

@bruth and welcome - great to have your questions and interest :-)
@bruth and if you are interested in a data package oriented "data sharing platform" you can check out https://datahub.io/
Byron Ruth
@bruth
@rufuspollock Thanks! Treating a query as a data resource is sort of weird, am may be more appropriate as provenance itself for the dataset being produced. The sources and contributors attributes are a good start for provenance. I need to evaluate to what extent I need/want to embed all provenance information in the datapackage.json or if I would reference a changelog/PROV graph of sorts
datahub looks very nice. i like how a dataset is presented. for my use case, this would be internal to the organization, so unfortunately I can't use this hosted version.
Stephen Gates
@Stephen-Gates
@bruth if you need an internal solution to create data packages, you may be interested in a project I'm leading http://data-curator.io - work in progress - v1.0.0 due before Christmas
Byron Ruth
@bruth
@Stephen-Gates This looks very promising. I am going to try it out. My org will be hiring a few library scientists to help in the data curation/documentation process of datasets. This could be a useful tool for them to assist in this process.
Stephen Gates
@Stephen-Gates
@bruth current release can open, edit, save data, guess or set column properties, validate data. These milestones describe this year's plan and we're seeking funding for version 2.
Stephen Gates
@Stephen-Gates
I'm interested in implementing the pattern allowing a foreign key to reference another data package. Does anyone have examples of this? Also, what level of adoption is required for work to commence on including this in the table schema standard?
Stephen Gates
@Stephen-Gates
I did find the concept was in the spec at one stage.
Rufus Pollock
@rufuspollock
@Stephen-Gates i've played around with implementing and we're thinking about this in datahub.io so if you were working on this you'd definitely have someone to chat with ... (and experiment with)
Stephen Gates
@Stephen-Gates
Great @rufuspollock just starting to spec Data Curator v2 for next year and the ability to reference an external table for validation is a feature desired by our sponsor.
Rufus Pollock
@rufuspollock
@Stephen-Gates :+1: - i agree it is a really useful feature
Stephen Gates
@Stephen-Gates

If I have a Data Package that contains a Data Resource that is shared under Public Domain, can someone please confirm that the licenses properties should be:

name : other-pd
path :
title : Other (Public Domain)

Based on http://licenses.opendefinition.org/licenses/other-pd.json from http://licenses.opendefinition.org/

Thanks

roll
@roll
@Stephen-Gates Based on current datapackage-js state referencing external data packages is relatively simple feature to add (integrity check + dereferencing). Should cost a few hours of work.
But I think we should add a proper Data Package Identifier support first
Stephen Gates
@Stephen-Gates
Thanks @roll but I'm not sure I understand. Are you saying that using the data package url, data resource location and the foreign key fields names in the current specification are inadequate to location the data to perform an integrity check? Are you suggesting that a data package identifier would assist with the fact that the data at that location could change? Do you think the use of an identifier would be mandatory or a best practice?
Paul Walsh
@pwalsh
@Stephen-Gates I think that @roll is saying that actually implementing foreign keys / references across data packages is quite simple in the JavaScript Data Package library that we maintain. However, he is reluctant to just go ahead and do so without some others things in place first.
roll
@roll
@pwalsh @Stephen-Gates I've meant that implementing a support for external referencing by a descriptor URL could be literally just a few lines addition to https://github.com/frictionlessdata/datapackage-js/blob/master/src/resource.js#L396 (load external DP there instead of current). No problem to add it if it's needed.

My second thought was just that I suppose end-users will like much more an ability to reference by package names, not urls like:

{fields: "country", reference: {package: "country-codes", resource: "countries", fields: "code"}}

But support for identifiers spec (which is also easy) surely could be just next step after basic external referencing support. So here I wan't clear. It's not any kind of requirement of my preference about an implementation order.

Stephen Gates
@Stephen-Gates
Thanks @roll. Reference by URL is all I was seeking. I assume that a change to the Table Schema standard would be required before a change to the code could be made?
roll
@roll
@Stephen-Gates No, I think we're good to go because for now only one mention of external referencing is in patterns - http://specs.frictionlessdata.io/patterns/#table-schema:-foreign-keys-to-data-packages. And it allows url referencing. Other question - should it be a datapackage property or package property? cc @rufuspollock
Serah Njambi Rono
@serahrono
New Frictionless Data pilot case study on eLife's use of goodtables for data validation of scientific research data http://frictionlessdata.io/case-studies/elife/
Rufus Pollock
@rufuspollock

And it allows url referencing. Other question - should it be a datapackage property or package property? cc @rufuspollock

I guess we could switch to simple package - do you have a preference or suggestion?

roll
@roll
@rufuspollock I think our intention on all levels is to use consistent package/resource OR datapackage/dataresource. So having a fk.reference.resource already suggests to use package.
Also The foreignKey MAY have a property datapackage. This property is a string being a url pointing to a Data Package or is the name of a datapackage. If there is an intention to use Data Package Identifier spec across main specs I think we need here to mention it instead of just a url/name.