These are chat archives for frictionlessdata/chat

30th
Dec 2017
Rufus Pollock
@rufuspollock
Dec 30 2017 10:31

@roll @vitorbaptista do we have a common repo(s) where we have sample data files e.g. sample csv, sample data packages etc?

If not what do you think of maintaining one? It would be super useful for the community to have a common set of test data.

Stephen Gates
@Stephen-Gates
Dec 30 2017 10:32
^^ agreed current example repos are out of date e.g. https://github.com/frictionlessdata/example-data-packages
Rufus Pollock
@rufuspollock
Dec 30 2017 10:43
@Stephen-Gates :thumbsup:
@Stephen-Gates do you know of any other example datasets or files?
I also started making one for Data Curator testing but it’s a bit of a mess at present https://github.com/Stephen-Gates/data-package-examples
Rufus Pollock
@rufuspollock
Dec 30 2017 10:50

Thinking about requirement for example data packages:

  1. As a Tutorial writer I want a set of data files and data packages I can use in my tutorials so that i can embed them and point users to them to play with them themselves
  2. As a Developer writing a library I want to have a set of standard test data files and data packages as a reference for my implementation tests
  3. As a new Publisher of data packages i want to see examples that i can copy and use so that I can move quickly and understand what is involved
  4. As a Consumer of data packages I want to see some examples for use

My sense is that the "exemplar" and "test" use cases are somewhat different. 1+3+4 are exemplar and want "nice' data packages". 2 (+1) are more test and are about testing the real range of sitautions and being super simple for testing.

My sense is that the key here to focus on is the test (lib developer) case to start with.

They probably want versioning and ability to git submodule so they can pin the data they are developing against (e.g. if data package spec gets upgraded they can still keep old spec versions if they need them).

wdyt?

Stephen Gates
@Stephen-Gates
Dec 30 2017 10:54
@rufuspollock sounds good. Agree on differences between examplar and test. Some of my packages have data errors on purpose that have help discover issues in table schema.js and data curator.
Rufus Pollock
@rufuspollock
Dec 30 2017 11:07
@Stephen-Gates exactly
@vitorbaptista @pwalsh where is the best place to open issues about this kind of thing? I think the pm repo has been deprecated / deleted https://github.com/frictionlessdata/pm 404s ...
roll
@roll
Dec 30 2017 13:01
@rufuspollock I think Dan has started this work in https://github.com/frictionlessdata/example-data-packages

About issues current system is we have two main issue trackers aside concrete libs:

The second on I think could be used as just for everything else except the specs. Also the FD workboard lives on this repo:

roll
@roll
Dec 30 2017 13:14
@rufuspollock @Stephen-Gates Have you eliminated the idea of having goodtables-py/CLI as a backend for goodtables-js - frictionlessdata/goodtables-js#19 It's a pretty trivial feature to implement which allow to validate data in JavaScript locally using Python CLI to generate data quality reports. Having pure goodtables-js is cool but we should take into account that it could mean really a lot of work because in JavaScript we miss both tabulator and goodtables. Which I would say the most complex software in the Python stack. Also it could be always a problem that JavaScript implementation will be possibly always very limited in compare to the Python one. Python for example already support advanced checks and other advanced concepts.
roll
@roll
Dec 30 2017 13:54
Last but not least I wish everyone Happy Merry Christmas (retrospectively=) and Happy New Year! I would say this year was absolutely stellar for the whole Frictionless Data community. The movement have finished the specs-v1 and implemented it for 9(!!!) languages. There are now goodtables.io stack, new-brand datahub.io, awesome projects like Data Curator and much more. It's really great to be a part of this project and community. :tada: :sparkles: :+1:
nathanxmeyer
@nathanxmeyer
Dec 30 2017 16:11
Hello frictionlessdata folks. New here (to python, to frictionlessdata, etc). I am working on a personal project in the archaeology domain, using python to build a SQLite database from many CSV files. My first question is whether the frictionless data schema specifications for datapackage and table are fairly stable. My second question is whether the goodtables python implementation is something that has future viability. Last question: is there a good example of validation against a JSON schema - the readme references this but I am not seeing in the documentation how to pass the schema to validate. Thanks very much, Nathan
nathanxmeyer
@nathanxmeyer
Dec 30 2017 16:32
Hello again, I found this (https://frictionlessdata.io/guides/validating-data/) so am fine on my last question but still interested in thoughts on the first two. Thanks!
Rufus Pollock
@rufuspollock
Dec 30 2017 17:16

@nathanxmeyer first, great to hear from you. To answer your questions:

My first question is whether the frictionless data schema specifications for datapackage and table are fairly stable.

Yes, the schema specs are definitely very stable - they are now v1.0 and have been refined for ~5y. Of course, they will continue to evolve gently but backwards compatibility will be maintained etc.

My second question is whether the goodtables python implementation is something that has future viability

Yes, absolutely. These have been heavily developed and will be continue to be!