These are chat archives for frictionlessdata/chat

2nd
Apr 2017
Adam Kariv
@akariv
Apr 02 2017 05:34
@pwalsh would that be mandatory?
Also, suppose we release FDP spec version 0.4 which is based on the new TDP v1, who should keep that version compatibility mapping?
Unless we'll have multiple spec entries ?
Paul Walsh
@pwalsh
Apr 02 2017 06:09
@akariv not sure about FDP yet. Not sure how much logic is 'hard coded' in python in the OS Libs, and how much depends on the JSON Schemas.
@akariv I think v1 and forward specs should have this mandatory, yes. And the absence of it might be used to trigger backwards compatibility paths. What do you think?
Adam Kariv
@akariv
Apr 02 2017 06:16
Forget FDP: TDP spec ver x extends DP spec ver y and uses JTS spec ver z - how do you maintain this compatibility matrix?
Paul Walsh
@pwalsh
Apr 02 2017 06:22
Stick with a version of the lib that matches the version of the spec you support.... but I see.
This is no different to the pre v1 scenario, so this is not an issue with v1 as such. Going forward, we theoretically can deal with this better with explicit versioning, but we have to think about this more as we iterate on future versions.
Adam Kariv
@akariv
Apr 02 2017 06:28
I think the difference is that when you're pre v1, people expect things to break often.
After v1, I want to be able to generate a datapackage today and know that latest libraries in a year's time will still be able to process it.
And not that I'll need to install a specific combo of library versions for each datapackage I want to open...
Adam Kariv
@akariv
Apr 02 2017 06:37
And if we have 3 or more specs that are moving separately, we need to be able to specify each of their versions separately
Paul Walsh
@pwalsh
Apr 02 2017 07:56
@akariv correct. Let's get all the core libs primed for v1, with appropriate backwards compatibility, and we can deal with forward versioning after that. There are several approaches we can take.
Adam Kariv
@akariv
Apr 02 2017 08:08
Related, we should consider keeping (by convention) the versions of the libraries in sync with the matching spec versions. e.g. datapackage-py <=> Data Package Spec, jsontableschema-py <=> Table Schema Spec etc.
And while I agree that v1 libraries shouldn't deal with the spec versioning just yet, I think that the specs should (starting from v1) - or you'll end up with annoying logic (e.g. 'if the descriptor doesn't have that element it must be from version X')
Ori Hoch
@OriHoch
Apr 02 2017 08:19
yes, agree with @akariv - we should start having versioning as required in the spec ASAP
also, have the tools raise warnings in case of datapackage without a version
Adam Kariv
@akariv
Apr 02 2017 08:32

Unrelated - reading the data-resource spec now and some of the wording got me confused:

The dereferenced value of each referenced data source in the data array MUST be commensurate with a native, dereferenced representation of the data the resource describes. For example, in a Tabular Data Resource, this means that the dereferenced value of data MUST be an array.

  1. data is an array of URIs.
  2. Each URI should be dereferenced into an array.
  3. So data should be dereferenced into an array of arrays?
(what was the use case, again, for making data an array an not just a URI?)
Adam Kariv
@akariv
Apr 02 2017 08:45
Is there an implicit concatenation semantic between two URIs in a tabular resource? Or are all URIs alternative to each other?
If it's the latter, then the examples in the spec really do a bad job in clarifying that.
Paul Walsh
@pwalsh
Apr 02 2017 08:56
The use case is a data source spread across multiple files.
Adam Kariv
@akariv
Apr 02 2017 09:00
And how should the multiple files be combined into one? For tabular data it's relatively simple, but what about non tabular resources?
Just seems as a non cost-effective complication for the benefit it might bring (sorry for only raising this now, but I did miss the "array" part in the entire "data" property discussion)
Paul Walsh
@pwalsh
Apr 02 2017 09:16
@akariv the array for data is pre-v1, but was in the lead up to v1 and not implemented anywhere that I know of. We allowed path (which also, pre v1, merged path and url into a single property) to be a string or an array. For v1, we’ve just enforced an array, which can of course be an array with one item. We did this as part of removing the pattern throughout the spec of “string or array” and “string or object” to make things more explicit.
As for concat of non-tabular resources, it is undefined at this stage.
This is now data as we got rid of all special handling across path, url, and data, some of which led to undefined behaviour in implementations, and moved towards a single property with consistent behaviour.
Ori Hoch
@OriHoch
Apr 02 2017 09:33
unrelated question -
do we have a python library that provides an encode_value function like this -
assert encode_value(datetime.date(2016, 5, 4), {"name": "date", "type": "date", "format": "%d/%m/%Y"}) == "04/05/2016"
?
or more generally - which library is responsible for encoding the json table schema values?
The closest functionality I found so far is in the opposite direction:
assert jsontableschema.Field({"name": "date", "type": "date", "format": "fmt:%d/%m/%Y"}).cast_value("04/05/2016") == date(2016, 5, 4)
Paul Walsh
@pwalsh
Apr 02 2017 09:38
@OriHoch we don’t have anything in the opposite direction. Would be an excellent feature to add to Field
Ori Hoch
@OriHoch
Apr 02 2017 09:51
Cool, how would you call this function? Field.encode_value ?
Adam Kariv
@akariv
Apr 02 2017 09:56
uncast?
Paul Walsh
@pwalsh
Apr 02 2017 10:00
i think uncast_value is the better of the two, as it is clearer it is the opposite of cast_value
Ori Hoch
@OriHoch
Apr 02 2017 10:00
:thumbsup:
Paul Walsh
@pwalsh
Apr 02 2017 10:01
@OriHoch this is a very nice feature to add :)
Adam Kariv
@akariv
Apr 02 2017 10:02
But maybe start with the Python library and not the PHP implementation :)
BTW, in datapackage-pipelines I had a similar problem - I'm creating CSV files with a Table Schema, and I want to make sure the schema is correct (wrt format and other properties).
So I created a PYTHON_DIALECT which are the schema attributes matching to what you get when using Python's str method: