Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Nov 26 09:36
    sapetti9 edited #703
  • Nov 26 09:35
    sapetti9 closed #701
  • Nov 26 09:35
    sapetti9 edited #703
  • Nov 26 09:35
    sapetti9 opened #703
  • Nov 26 09:33
    sapetti9 edited #701
  • Nov 26 09:33
    sapetti9 edited #701
  • Nov 26 09:33
    sapetti9 edited #701
  • Nov 26 09:26
    sapetti9 edited #701
  • Nov 26 09:26
    sapetti9 edited #701
  • Nov 25 07:51
    roll opened #756
  • Nov 24 15:14
    lwinfree opened #755
  • Nov 18 17:11
    sapetti9 edited #701
  • Nov 18 17:11
    sapetti9 edited #701
  • Nov 18 17:11
    sapetti9 edited #701
  • Nov 18 17:10
    sapetti9 edited #701
  • Nov 18 13:48
    sapetti9 edited #701
  • Nov 18 13:48
    sapetti9 edited #701
  • Nov 18 13:28
    sapetti9 edited #701
  • Nov 18 13:13
    sapetti9 edited #701
  • Nov 17 15:16
    sapetti9 edited #701
Stephen Gates
@Stephen-Gates
I'm interested in implementing the pattern allowing a foreign key to reference another data package. Does anyone have examples of this? Also, what level of adoption is required for work to commence on including this in the table schema standard?
Stephen Gates
@Stephen-Gates
I did find the concept was in the spec at one stage.
Rufus Pollock
@rufuspollock
@Stephen-Gates i've played around with implementing and we're thinking about this in datahub.io so if you were working on this you'd definitely have someone to chat with ... (and experiment with)
Stephen Gates
@Stephen-Gates
Great @rufuspollock just starting to spec Data Curator v2 for next year and the ability to reference an external table for validation is a feature desired by our sponsor.
Rufus Pollock
@rufuspollock
@Stephen-Gates :+1: - i agree it is a really useful feature
Stephen Gates
@Stephen-Gates

If I have a Data Package that contains a Data Resource that is shared under Public Domain, can someone please confirm that the licenses properties should be:

name : other-pd
path :
title : Other (Public Domain)

Based on http://licenses.opendefinition.org/licenses/other-pd.json from http://licenses.opendefinition.org/

Thanks

roll
@roll
@Stephen-Gates Based on current datapackage-js state referencing external data packages is relatively simple feature to add (integrity check + dereferencing). Should cost a few hours of work.
But I think we should add a proper Data Package Identifier support first
Stephen Gates
@Stephen-Gates
Thanks @roll but I'm not sure I understand. Are you saying that using the data package url, data resource location and the foreign key fields names in the current specification are inadequate to location the data to perform an integrity check? Are you suggesting that a data package identifier would assist with the fact that the data at that location could change? Do you think the use of an identifier would be mandatory or a best practice?
Paul Walsh
@pwalsh
@Stephen-Gates I think that @roll is saying that actually implementing foreign keys / references across data packages is quite simple in the JavaScript Data Package library that we maintain. However, he is reluctant to just go ahead and do so without some others things in place first.
roll
@roll
@pwalsh @Stephen-Gates I've meant that implementing a support for external referencing by a descriptor URL could be literally just a few lines addition to https://github.com/frictionlessdata/datapackage-js/blob/master/src/resource.js#L396 (load external DP there instead of current). No problem to add it if it's needed.

My second thought was just that I suppose end-users will like much more an ability to reference by package names, not urls like:

{fields: "country", reference: {package: "country-codes", resource: "countries", fields: "code"}}

But support for identifiers spec (which is also easy) surely could be just next step after basic external referencing support. So here I wan't clear. It's not any kind of requirement of my preference about an implementation order.

Stephen Gates
@Stephen-Gates
Thanks @roll. Reference by URL is all I was seeking. I assume that a change to the Table Schema standard would be required before a change to the code could be made?
roll
@roll
@Stephen-Gates No, I think we're good to go because for now only one mention of external referencing is in patterns - http://specs.frictionlessdata.io/patterns/#table-schema:-foreign-keys-to-data-packages. And it allows url referencing. Other question - should it be a datapackage property or package property? cc @rufuspollock
Serah Njambi Rono
@serahrono
New Frictionless Data pilot case study on eLife's use of goodtables for data validation of scientific research data http://frictionlessdata.io/case-studies/elife/
Rufus Pollock
@rufuspollock

And it allows url referencing. Other question - should it be a datapackage property or package property? cc @rufuspollock

I guess we could switch to simple package - do you have a preference or suggestion?

roll
@roll
@rufuspollock I think our intention on all levels is to use consistent package/resource OR datapackage/dataresource. So having a fk.reference.resource already suggests to use package.
Also The foreignKey MAY have a property datapackage. This property is a string being a url pointing to a Data Package or is the name of a datapackage. If there is an intention to use Data Package Identifier spec across main specs I think we need here to mention it instead of just a url/name.
Paul Walsh
@pwalsh

@roll @rufuspollock I just don't know why we would tie this to Data Package. We have Data Resource as a distinct spec now - there is no reason why publishers could not publish distinct Data Resources.

So, I don't see why fk.reference.resource needs to suggest usage of a data package. This again ties back to the JSON Pointer thing - I still do not see the benefit of us having custom DP identifiers, assumptions about FKs to packages, etc. etc. when we can just reuse existing specifications that are quite simple and designed for this type of referencing.

I'd really like to see the argument for why a custom approach is better, rather than me continuously jumping in on these conversations and saying "but .... json pointer" :)

roll
@roll
I think we need to compare things on real examples (cc @Stephen-Gates) I suppose there was a lot ideas on different referencing approaches. E.g. @akariv's one with resource referencing (but if I could remember correctly it was not directly json-pointers).
There is a Metatab's one - https://github.com/Metatab/appurl
roll
@roll
I have a feeling that Data Package Identifier spec (which almost not implemented for now anywhere) could be evolved to more generic referencing specification which supports versioning (datahub.io), resource referencing, row referencing, cell referencing etc. All this stuff is like in the air lately, appearing in many issue discussions.

So instead of package property it could be something like this:

{fields: "country", reference: {resource: "country-codes#countries", fields: "code"}}

Not saying that I like something like this more (e.g. using json-pointer) but it could be an option.

roll
@roll
On other hand the eco-system is very Data Package centric (e.g. datahub.io stores packages, not resources) so even having the Data Resource specification to use package property feels kinda natural.
Stephen Gates
@Stephen-Gates
My real world example, every year every government department must publish the same data tables in their annual reports. These are also published as open data. An Excel spreadsheet is sent around to all departments to enter the data. The spreadsheet has a validation rules to control data entered (think enum constraint or FK lookup). With Frictionless Data, a template Data Package with a Table Schema is circulated to aid in collecting data. This references a table on CKAN that contains the FK lookup values. The data is validated by each department and the data is consistent for every department. The following year there may be a change to the FK lookup table - I assume this change would result in a new URL for the dataset in CKAN (I could be wrong here). The process is repeated to collect and validate the data. Hope that helps.
Kenji
@Kenji-K
Just poppin in to say that I love what you guys are doing
Kenji
@Kenji-K
Well maybe I should also use this opportunity to ask a question, given that I am just recently getting my feet wet with respect to this subject. What is the relationship between the FD specs and Dublin Core? I have an idea in my head but I'd rather not muddy the waters with my misconceptions and will refrain from putting it here.
Stephen Gates
@Stephen-Gates

Hi, I'm updating the list of open licenses and its data package in https://github.com/okfn/licenses.

I'm validating my changes with Goodtables.io at http://goodtables.io/github/Stephen-Gates/licenses

In my okfn/licenses#57, I accidentally added an error by mis-spelling "superceded". Goodtables.io didn't fail the data despite my enum constraint.

I'm wondering if there's an error in my table schema or if GoodTables has a bug?

roll
@roll
@Stephen-Gates Thanks. I'm looking into it
roll
@roll
@Stephen-Gates It should work now
Stephen Gates
@Stephen-Gates
Thanks @roll just pushed a change to the licenses.csv with one error remained ("superseded" L71) but still passed GoodTables test.
Martín n
@martinszy
Hello, I'm testing datapackages-pipeline
Rufus Pollock
@rufuspollock
@martinszy hey there, that's great!
Martín n
@martinszy
I have a couple of questions:
1) I have changing filenames, is there any way to use wildcards in add_resource?
2) Is is a bad idea to have a dump.to_ckan action?
3) Does ckan handle datapackages yet?
@amercader and @brew can answer any questions you have about the CKAN integrations next week
Martín n
@martinszy
thanks, I'll check that out
Martín n
@martinszy
For 1) I'm thinking: either create a pre-process that generates the YAML file or modify datapackages-pipeline in order to allow for custom_resource_adders or something like that, that listen for multiple files and then trigger the rest of the process... but I've not analyzed this properly yet
roll
@roll
@Stephen-Gates Could you please try again?
Stephen Gates
@Stephen-Gates
@roll working perfectly (see Job History). The PR okfn/licenses#57 is now good to go. Thanks so much for your help.
Brook Elgie
@brew
@martinszy Don't forget you can generate pipelines dynamically using a "Generator": https://github.com/frictionlessdata/datapackage-pipelines/#plugins-and-source-descriptors Perhaps this would help with the flexibility you're after?
Martín n
@martinszy
@brew awesome! I'll check it out!!
Oleg Lavrovsky
@loleg
Good morning & greetings from a park bench (47.37226921, 8.54668868) next to the office of data.stadt-zurich.ch, where I'm starting my new project with you today.
I'm starting with a review of the Python implementation, which I started already testing last week in combination with a tool we use to work with data providers & teams at hackathons. Thanks @callmealien @pwalsh @jobarratt for your help getting plugged in so far! I'll be keeping an open dev log, and you can mention me here anytime if you have questions or suggestions.
Vitor Baptista
@vitorbaptista
@loleg Good luck with the new project! Do you plan on publishing your open dev log somewhere?
Oleg Lavrovsky
@loleg
@vitorbaptista absolutely, it'll be on GitHub at least in raw form later today
Vitor Baptista
@vitorbaptista
@loleg Cool! Please let me know when it's online
Oleg Lavrovsky
@loleg
@vitorbaptista thanks for your enthusiasm :) https://github.com/loleg/devlog/tree/master/content
Vitor Baptista
@vitorbaptista
:tada: :smile:
Oleg Lavrovsky
@loleg
Are there notes anywhere of the roots of the standard, specifically how much it owes to (and potentially influences the future of) https://github.com/ckan/ckan/blob/master/ckan/logic/schema.py#L206 ?