Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 16:09
    sapetti9 edited #709
  • 16:09
    sapetti9 edited #709
  • 13:59
    peterdesmet synchronize #766
  • 13:56
    peterdesmet synchronize #766
  • 13:54
    peterdesmet opened #766
  • Jan 26 13:45
    sapetti9 edited #709
  • Jan 26 13:45
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 26 13:44
    sapetti9 edited #709
  • Jan 25 15:30

    github-actions[bot] on gh-pages

    Deploying to gh-pages from @ 3… (compare)

  • Jan 25 15:25

    rufuspollock on master

    Enforce that `license` has `nam… Merge pull request #765 from pe… (compare)

  • Jan 25 15:25
    rufuspollock closed #765
  • Jan 25 14:59
    peterdesmet opened #765
  • Jan 17 14:45
    peterdesmet closed #764
  • Jan 17 13:15
    peterdesmet edited #764
  • Jan 17 11:46
    peterdesmet edited #764
  • Jan 17 09:34
    peterdesmet opened #764
roll
@roll
But I think we should add a proper Data Package Identifier support first
Stephen Gates
@Stephen-Gates
Thanks @roll but I'm not sure I understand. Are you saying that using the data package url, data resource location and the foreign key fields names in the current specification are inadequate to location the data to perform an integrity check? Are you suggesting that a data package identifier would assist with the fact that the data at that location could change? Do you think the use of an identifier would be mandatory or a best practice?
Paul Walsh
@pwalsh
@Stephen-Gates I think that @roll is saying that actually implementing foreign keys / references across data packages is quite simple in the JavaScript Data Package library that we maintain. However, he is reluctant to just go ahead and do so without some others things in place first.
roll
@roll
@pwalsh @Stephen-Gates I've meant that implementing a support for external referencing by a descriptor URL could be literally just a few lines addition to https://github.com/frictionlessdata/datapackage-js/blob/master/src/resource.js#L396 (load external DP there instead of current). No problem to add it if it's needed.

My second thought was just that I suppose end-users will like much more an ability to reference by package names, not urls like:

{fields: "country", reference: {package: "country-codes", resource: "countries", fields: "code"}}

But support for identifiers spec (which is also easy) surely could be just next step after basic external referencing support. So here I wan't clear. It's not any kind of requirement of my preference about an implementation order.

Stephen Gates
@Stephen-Gates
Thanks @roll. Reference by URL is all I was seeking. I assume that a change to the Table Schema standard would be required before a change to the code could be made?
roll
@roll
@Stephen-Gates No, I think we're good to go because for now only one mention of external referencing is in patterns - http://specs.frictionlessdata.io/patterns/#table-schema:-foreign-keys-to-data-packages. And it allows url referencing. Other question - should it be a datapackage property or package property? cc @rufuspollock
Serah Njambi Rono
@serahrono
New Frictionless Data pilot case study on eLife's use of goodtables for data validation of scientific research data http://frictionlessdata.io/case-studies/elife/
Rufus Pollock
@rufuspollock

And it allows url referencing. Other question - should it be a datapackage property or package property? cc @rufuspollock

I guess we could switch to simple package - do you have a preference or suggestion?

roll
@roll
@rufuspollock I think our intention on all levels is to use consistent package/resource OR datapackage/dataresource. So having a fk.reference.resource already suggests to use package.
Also The foreignKey MAY have a property datapackage. This property is a string being a url pointing to a Data Package or is the name of a datapackage. If there is an intention to use Data Package Identifier spec across main specs I think we need here to mention it instead of just a url/name.
Paul Walsh
@pwalsh

@roll @rufuspollock I just don't know why we would tie this to Data Package. We have Data Resource as a distinct spec now - there is no reason why publishers could not publish distinct Data Resources.

So, I don't see why fk.reference.resource needs to suggest usage of a data package. This again ties back to the JSON Pointer thing - I still do not see the benefit of us having custom DP identifiers, assumptions about FKs to packages, etc. etc. when we can just reuse existing specifications that are quite simple and designed for this type of referencing.

I'd really like to see the argument for why a custom approach is better, rather than me continuously jumping in on these conversations and saying "but .... json pointer" :)

roll
@roll
I think we need to compare things on real examples (cc @Stephen-Gates) I suppose there was a lot ideas on different referencing approaches. E.g. @akariv's one with resource referencing (but if I could remember correctly it was not directly json-pointers).
There is a Metatab's one - https://github.com/Metatab/appurl
roll
@roll
I have a feeling that Data Package Identifier spec (which almost not implemented for now anywhere) could be evolved to more generic referencing specification which supports versioning (datahub.io), resource referencing, row referencing, cell referencing etc. All this stuff is like in the air lately, appearing in many issue discussions.

So instead of package property it could be something like this:

{fields: "country", reference: {resource: "country-codes#countries", fields: "code"}}

Not saying that I like something like this more (e.g. using json-pointer) but it could be an option.

roll
@roll
On other hand the eco-system is very Data Package centric (e.g. datahub.io stores packages, not resources) so even having the Data Resource specification to use package property feels kinda natural.
Stephen Gates
@Stephen-Gates
My real world example, every year every government department must publish the same data tables in their annual reports. These are also published as open data. An Excel spreadsheet is sent around to all departments to enter the data. The spreadsheet has a validation rules to control data entered (think enum constraint or FK lookup). With Frictionless Data, a template Data Package with a Table Schema is circulated to aid in collecting data. This references a table on CKAN that contains the FK lookup values. The data is validated by each department and the data is consistent for every department. The following year there may be a change to the FK lookup table - I assume this change would result in a new URL for the dataset in CKAN (I could be wrong here). The process is repeated to collect and validate the data. Hope that helps.
Kenji
@Kenji-K
Just poppin in to say that I love what you guys are doing
Kenji
@Kenji-K
Well maybe I should also use this opportunity to ask a question, given that I am just recently getting my feet wet with respect to this subject. What is the relationship between the FD specs and Dublin Core? I have an idea in my head but I'd rather not muddy the waters with my misconceptions and will refrain from putting it here.
Stephen Gates
@Stephen-Gates

Hi, I'm updating the list of open licenses and its data package in https://github.com/okfn/licenses.

I'm validating my changes with Goodtables.io at http://goodtables.io/github/Stephen-Gates/licenses

In my okfn/licenses#57, I accidentally added an error by mis-spelling "superceded". Goodtables.io didn't fail the data despite my enum constraint.

I'm wondering if there's an error in my table schema or if GoodTables has a bug?

roll
@roll
@Stephen-Gates Thanks. I'm looking into it
roll
@roll
@Stephen-Gates It should work now
Stephen Gates
@Stephen-Gates
Thanks @roll just pushed a change to the licenses.csv with one error remained ("superseded" L71) but still passed GoodTables test.
Martín n
@martinszy
Hello, I'm testing datapackages-pipeline
Rufus Pollock
@rufuspollock
@martinszy hey there, that's great!
Martín n
@martinszy
I have a couple of questions:
1) I have changing filenames, is there any way to use wildcards in add_resource?
2) Is is a bad idea to have a dump.to_ckan action?
3) Does ckan handle datapackages yet?
@amercader and @brew can answer any questions you have about the CKAN integrations next week
Martín n
@martinszy
thanks, I'll check that out
Martín n
@martinszy
For 1) I'm thinking: either create a pre-process that generates the YAML file or modify datapackages-pipeline in order to allow for custom_resource_adders or something like that, that listen for multiple files and then trigger the rest of the process... but I've not analyzed this properly yet
roll
@roll
@Stephen-Gates Could you please try again?
Stephen Gates
@Stephen-Gates
@roll working perfectly (see Job History). The PR okfn/licenses#57 is now good to go. Thanks so much for your help.
Brook Elgie
@brew
@martinszy Don't forget you can generate pipelines dynamically using a "Generator": https://github.com/frictionlessdata/datapackage-pipelines/#plugins-and-source-descriptors Perhaps this would help with the flexibility you're after?
Martín n
@martinszy
@brew awesome! I'll check it out!!
Oleg Lavrovsky
@loleg
Good morning & greetings from a park bench (47.37226921, 8.54668868) next to the office of data.stadt-zurich.ch, where I'm starting my new project with you today.
I'm starting with a review of the Python implementation, which I started already testing last week in combination with a tool we use to work with data providers & teams at hackathons. Thanks @callmealien @pwalsh @jobarratt for your help getting plugged in so far! I'll be keeping an open dev log, and you can mention me here anytime if you have questions or suggestions.
Vitor Baptista
@vitorbaptista
@loleg Good luck with the new project! Do you plan on publishing your open dev log somewhere?
Oleg Lavrovsky
@loleg
@vitorbaptista absolutely, it'll be on GitHub at least in raw form later today
Vitor Baptista
@vitorbaptista
@loleg Cool! Please let me know when it's online
Oleg Lavrovsky
@loleg
@vitorbaptista thanks for your enthusiasm :) https://github.com/loleg/devlog/tree/master/content
Vitor Baptista
@vitorbaptista
:tada: :smile:
Oleg Lavrovsky
@loleg
Are there notes anywhere of the roots of the standard, specifically how much it owes to (and potentially influences the future of) https://github.com/ckan/ckan/blob/master/ckan/logic/schema.py#L206 ?
Oleg Lavrovsky
@loleg
(or is this a touchy topic I should leave to later discussion..)
Stephen Gates
@Stephen-Gates

Hi, I'm looking for test datapackage.zip files. I came across https://github.com/frictionlessdata/testsuite-extended and https://github.com/frictionlessdata/example-data-packages but these didn't help. I couldn't find datapackage.zip files on datahub.io either.

Any suggestions on a source for data package test data and if not, where is the best place to contribute these?

Rufus Pollock
@rufuspollock

Are there notes anywhere of the roots of the standard, specifically how much it owes to (and potentially influences the future of) https://github.com/ckan/ckan/blob/master/ckan/logic/schema.py#L206 ?

Not a touchy topic at all. If you go back to the pre-history of frictionless data in 2007 or so then yes: ckan metadata was partially inspired by python packaging and so was first "dpm" (data package manager). In fact, as you may know, CKAN was originally intended to act like pypi or cpan cran etc.

Over time, the source of inspiration of data packages has shifted a bit towards more recent packaging systems like node + package.json (pypi was probably not the best initial inspiration).

This is something that would probably be worth a discuss.okfn.org question so we can write up there for posterity :-)

Oleg Lavrovsky
@loleg
Got it, will do, thanks Rufus!
Meiran Zhiyenbayev
@Mikanebu

Core Data: Essential Datasets for Data Wranglers and Data Scientists

This post introduces you to the Core Data, presents a couple of examples and shows you how you can access and use core data easily from your own tools and systems including R, Python, Pandas and more.
http://datahub.io/blog/core-data-essential-datasets-for-data-wranglers-and-data-scientists

This is a blog post. To read full text, please, follow the link above.

Jeremy Palmer
@palmerj
Hi All!
I was wondering what it would take to be involved in the next development of the specifications. In particular for the Tabular Data Package