These are chat archives for frictionlessdata/chat

12th
Jan 2018
Oleg Lavrovsky
@loleg
Jan 12 2018 12:23
Is there a boilerplate Data Package repository out there already?
Rufus Pollock
@rufuspollock
Jan 12 2018 12:24
@loleg what do you mean exactly?
Oleg Lavrovsky
@loleg
Jan 12 2018 12:28
I hope I haven't reinvented a wheel.. https://github.com/schoolofdata-ch/datapackage-boilerplate
Rufus Pollock
@rufuspollock
Jan 12 2018 12:33
@loleg ah ok, you mean like a template data package they can fork :-) - great idea.
Oleg Lavrovsky
@loleg
Jan 12 2018 12:34
yep! something to help people contribute packages. I am very open to feedback on how to do this.
by the way, what happened to datapackagist? it was a little clunky, but still very nice to use
Rufus Pollock
@rufuspollock
Jan 12 2018 12:37

yep! something to help people contribute packages. I am very open to feedback on how to do this.

@loleg data init command is also helpful in creating the datapackage.json (data command line tool is here https://github.com/datahq/data-cli). Have you used data init yet?

Now you mention it i think data init could scaffold things a bit more e.g. create a README ..., create a data directory etc.

Oleg Lavrovsky
@loleg
Jan 12 2018 12:38
I didn't realise that was part of the CLI since it's not in the help. and can datapackage-pipelines generate data packages as well?
Yeah init works for me, but it's a little minimal. You don't see in the generated .json how to put in a license, for instance.
Rufus Pollock
@rufuspollock
Jan 12 2018 12:39
@loleg yes! And data package pipelines is basically built into the datahub so you can use the datahub to generate data package for you from e.g. a single url.
Oleg Lavrovsky
@loleg
Jan 12 2018 12:40
Plus data init should accept a folder/file as parameter and infer the schema, possibly even the license from LICENSE (à la GitHub)
@rufuspollock I don't really understand what you mean by that last statement. Where on datahub.io can I put in a URL to generate a package?
Rufus Pollock
@rufuspollock
Jan 12 2018 12:42

@loleg we are working to improve init.

For less expert users the most attractive things might be the data-desktop http://datahub.io/download (and https://github.com/datahq/data-desktop). This gives you a user interface though you can only edit certain things atm (mainly the table schema). But it would be really easy to improve this to allow people to edit.

Oleg Lavrovsky
@loleg
Jan 12 2018 12:42
Great, thanks, I'll mention all that in the boilerplate.
Rufus Pollock
@rufuspollock
Jan 12 2018 12:43

Plus data init should accept a folder/file as parameter and infer the schema, possibly even the license from LICENSE (à la GitHub)

data init does infer the schema if you run it in a give directory with data in it (perhaps that is not so obvious).

It does not infer license (how would you want to do that?)

Oleg Lavrovsky
@loleg
Jan 12 2018 12:43
Based on the LICENSE file, for common open licenses, it should add the appropriate sections to datapackage.json
Rufus Pollock
@rufuspollock
Jan 12 2018 12:44
@loleg ah ok, if you have a license file it should try to parse it. Please open an issue about that here: https://github.com/datahq/data-cli/issues
Oleg Lavrovsky
@loleg
Jan 12 2018 12:45
Will do. Now I have a déjà vu feeling going through init's detection of files. The validation is great (should just not interrupt the whole process, yep, I'll ticket it)!
Rufus Pollock
@rufuspollock
Jan 12 2018 12:49

by the way, what happened to datapackagist? it was a little clunky, but still very nice to use

@loleg re that question we're working on a little app to do just this right now here: https://github.com/datahq/data-import-ui

The thing we are focused on is a mini-wizard for turning a file or url into a data package quickly and well. In my experience that is the real challenge: just creating a datapackage.json is not enough.

@loleg patches for data init would be really welcome :-)

Thanks @strets123 @rufuspollock Can any of you help in 1-1 chat?

@mshashank07 you are welcome to message me on 1-on-1 and i'll see if i can respond asynchronously ...

Oleg Lavrovsky
@loleg
Jan 12 2018 12:51
datahq/data-cli#239 added
Rufus Pollock
@rufuspollock
Jan 12 2018 12:54
@loleg my one other comment re the data package boiler plate is that the README might want to follow the structure for data packages here: https://datahub.io/docs/data-packages/publish-faq#readme
Oleg Lavrovsky
@loleg
Jan 12 2018 12:56
datahq/data-cli#241 added
datahq/data-cli#240 added
@rufuspollock you're absolutely right, but, ahem, I was looking at the top github.com/datasets for reference and none of them had Introduction, for instance, so that recommendation might need to be updated..
oh I see nevermind it's not a section header.
Still, I have a qualm with the "Preparation" section. Data Packages are meant to be consumed, not prepared. But they are sourced. That's why I have been focusing on "talking about the data" in the Data section, and describing "where the data comes from" in a Source section. Perhaps we should have a discussion about this elsewhere ;-)
Or maybe this is the right place to discuss it, and I am missing an important nuance of frictionless data here. Are all data packages de facto recipes? Is the data folder just a sample, while the real, up-to-date, raw data is always somewhere else?
Rufus Pollock
@rufuspollock
Jan 12 2018 13:03

@loleg i think this is the right place to have this discussion. I think one can discuss. The point about preparation is about how the data in the data package was created. So yes data packages are meant to be consumed but just like a cake lists its ingredients so can a data package list how it was made (if appropriate).

The data folder has all the data for the data package - but often you have built a data package from some upstream source (as with most core data packages ...).

This is all definitely modifiable: better naming for this stuff is important ...

Note: i will be afkb for a bit ..
Oleg Lavrovsky
@loleg
Jan 12 2018 13:05
You are what you eat! Thanks very much. I would also appreciate your feedback (not urgently) to the suggested wording in the License section of the boilerplate, which you'll see hints at the kind of process we are going through in our community.
Stephen Gates
@Stephen-Gates
Jan 12 2018 20:49