These are chat archives for frictionlessdata/chat

31st
Mar 2017
Rufus Pollock
@rufuspollock
Mar 31 2017 05:39

. #thinkingoutloud re Using the JSON Table Schema with JSON data.

What happens when someone has their tabular data as json data. I’ve noticed this recently in many cases of vega sample data for their demos and I think it is a common pattern esp for inline data.

Strictly, these are not “tabular data resources” because the source data is not CSV. But in a sense they are just that tabular data already pre-processed to JSON. Now, I don’t think these should be considered Tabular Data Resources because JSON is not CSV and the CSV matters (think non-tech user with Excel - JSON does not work).

However, you can use the JTS for this JSON just like for the CSV. And, from the point of view of things like a graphing library, this is like CSV but even better - the casting of CSV to JSON is already done!

I’m wondering about how we handle this - if any ...

roll
@roll
Mar 31 2017 08:07
TS doesn't require csv file as a data source but reviewing and updating TS-py I've realized that there is some tendency in the spec to talk about strings as a source. For example year is an integer with 4 digits. It feels like we're talking here about a string. I suppose it's because of overall lack in the spec strict concepts of data representation as a string and cast/typed/native/parsed values.
For example year I suppose is something that (a) valid integer in spec terms (b) >= 0 and <= 9999
Rufus Pollock
@rufuspollock
Mar 31 2017 08:30

@roll TS = Table Schema. Yes, i wrote it so it does not require CSV :-)

But Data Resource does and that is what I was talking about …

roll
@roll
Mar 31 2017 09:13
@rufuspollock Yes we're on the same page I've just added a side note about Table Schema readiness to describe data sources like JSON. We had a discussion with @pwalsh just a few days ago about exact the same topic ( JSON for Tabular Data Resource). He's mentioned NDJSON. Which I think could be great to support.
Rufus Pollock
@rufuspollock
Mar 31 2017 09:20
@roll :smile: - i think plain json will be more important than ndjson but both would be nice.
jobarratt
@jobarratt
Mar 31 2017 09:29
The second of two Case Studies which have been added to Frictionessdata.io this week. An interview with Data.world http://frictionlessdata.io/case-studies/data-world/
rufuspollock

@rufuspollock reads

Graph data packages, or “Universal Data Packages” that can encapsulate both tabular and graph data.

@jobarratt have you told them about data package views stuff and pointed them at the issue on that frictionlessdata/specs#77.

I’ve also got a huge update on views that i’ve been working on over the last few months that is nearly ready - i can paste it as a work in progress there.

Rufus Pollock
@rufuspollock
Mar 31 2017 09:49
@jobarratt great to have these case studies :clap:
roll
@roll
Mar 31 2017 10:16
@rufuspollock My initial thought was about plain JSON. It was in a context like - if inline data leads to so many discussions why just don't disallow inline data and add support for JSON in Tabular Resources? So people still be able to write beloved JSON but not mixing metadata(descriptor) and data(sources) which I suppose an important concept. But JSON is not streamable format by nature. So it's kinda break one of FD principles (streamability).
Rufus Pollock
@rufuspollock
Mar 31 2017 12:03
@roll the two things are different here - inline data is crucial for many portable use cases e.g. use in jupyter notebooks etc
Ethan Jewett
@esjewett
Mar 31 2017 12:40
Not to derail this discussion, but year is not an integer with 4 digits ;-) And yes, I’ve been working with historians including people who work on ancient history.
Years can have 1, 2, 3, 4, or more digits. Years can be negative. It’s fun!
Rufus Pollock
@rufuspollock
Mar 31 2017 13:39
@esjewett yes, exactly - see https://github.com/oki-archive/flexidate (one of my old projects!)
Ethan Jewett
@esjewett
Mar 31 2017 14:13
@rufuspollock Yeah, for Palladio we looked at a bunch of stuff around uncertainty and negative dates. Ended up doing a workaround to handle negative years at least, and decided not to pursue a comprehensive approach to uncertainty. But a very interesting area. It is really frustrating for historians to work with most standard parsers :-(
Rufus Pollock
@rufuspollock
Mar 31 2017 14:14
@esjewett absolutely - i built flexidate for a now defunct project called weaving history we did 2007-2011 http://www.weavinghistory.org/
Ethan Jewett
@esjewett
Mar 31 2017 14:14
@rufuspollock Maybe too much to ask, but it would be nice if the specification acknowledged this issue. Was surprised to see this definition in the Table Schema spec.
@rufuspollock Ah, very cool. I think that may have been one of the projects we included in our survey of pre-existing work when we started Palladio!