These are chat archives for frictionlessdata/chat

18th
May 2017
Ori Hoch
@OriHoch
May 18 2017 05:56
@jcockhren @akariv - I'm also doing similar stuff with datapackages and elasticsearch :+1:
Paul Walsh
@pwalsh
May 18 2017 06:08
Hey @OriHoch !
How are the php implementations going?
Ori Hoch
@OriHoch
May 18 2017 06:10
they are progressing well, you can install the latest versions from packagist :sparkles:
https://packagist.org/packages/frictionlessdata/tableschema
https://packagist.org/packages/frictionlessdata/datapackage
they are usable and stable, just don't have all the features, perhaps missing some validations etc..
Paul Walsh
@pwalsh
May 18 2017 06:15
Awesome @OriHoch ! I won’t install just yet. Do you have a timeline now for when you will have them ~ feature complete?
Ori Hoch
@OriHoch
May 18 2017 06:43
my current estimation is sometime in July
Paul Walsh
@pwalsh
May 18 2017 06:57
great!
Rufus Pollock
@rufuspollock
May 18 2017 07:19
@pwalsh also when can sam do the css touch-ups? I could get my team to look at this quickly - i think beautiful (and more important readable) makes a difference :-)
Paul Walsh
@pwalsh
May 18 2017 07:25
@jobarratt does Sam have any availability in the coming week or two for 1 day of CSS/HTML work on the specs?
Rufus Pollock
@rufuspollock
May 18 2017 07:50
@pwalsh in progress issue for this here frictionlessdata/specs#413

NEWSFLASH - Major change to spec layouts to improve readability

I've just pushed a major rework of the layout of the specs to give them more readability - frictionlessdata/specs#420

Here's what some of the key specs now look like:

https://specs.frictionlessdata.io/data-resource/
https://specs.frictionlessdata.io/data-package/

This was primarily a change to the layout and readability of the specs. There were also a few substantive changes at the same time:

  • Change back to data and path properties on resources from simple data - frictionlessdata/specs#414
  • URIs as a general concept are gone - we are back with either plain URLs for the licenses and sources web property or the url or path option for path on resources
jobarratt
@jobarratt
May 18 2017 08:22
@pwalsh Sam has no time as it is. However, let me see if we can squeeze a few hours from somewhere next week.
sirex
@sirex
May 18 2017 09:44
I really like the Examples section in the Data Resource specs. It would be nice to see same thing in the Data Package specs.
Adam Shepherd
@ashepherd
May 18 2017 12:33
Hi @pwalsh @jcockhren @akariv I'm also interested in the Table Schema > Elasticsearch use case. I reached out on the Trello board and Dan Fowler pointed me here. We are evaluating Data Packages for a research project funded by NSF for searching through ocean proteomics data. We are looking for a way for researchers to submit data packages to a repository where the data will be validated and loaded into an Elasticsearch index. Remembering the presentation @rufuspollock made on data containers at the Research Data Alliance, we felt it was a good candidate for our research project. We would be interesting in collaborating on that pipeline if you are interested.
Jurnell Cockhren
@jcockhren
May 18 2017 15:27
@ashepherd Awesome! We've built what you've described internally (re: "data will be validated and loaded into an Elasticsearch index").
Jurnell Cockhren
@jcockhren
May 18 2017 15:39
@OriHoch @ashepherd @akariv In regards to getting CSV data into Elasticsearch, we've learned a few lessons worth sharing. These will probably require more discussion on github issues at some point:
  1. numeric types. Elasticsearch has integer, float, scaled_float (there's more). scaled_float type is a good candidate for representing dollar values. I have production code that uses it. https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html#scaled-float-params
  2. Mapping Definition -> we use Elasticsearch DSL to codify our mappings and we can dynamically define mappings for new Indices. Easy to port given the a valid tableschema definition.
  3. Inserting Documents -> Use bulk inserting where you can.
  4. Index Creation -> make it easy for developers to define the index settings (# of shards, replicas, refresh_interval)
Adam Shepherd
@ashepherd
May 18 2017 16:45
@jcockhren nice! thank you for passing these along. have you already written code to go from a Data Package into an ES index? If so, would you be willing to share? If not, would you be interested in collaborating with us?
Jurnell Cockhren
@jcockhren
May 18 2017 19:36
@ashepherd DataPackage to ES is happening in our next sprint. Once we have a working POC, we'll be opening up the codebase for contributors! So yes to collaboration.
Tod Robbins
@todrobbins
May 18 2017 20:36
@rufuspollock great work on the spec rework! πŸ‘πŸ»πŸ‘πŸ»πŸ‘πŸ»
Adam Shepherd
@ashepherd
May 18 2017 21:24
@jcockhren that's great! Please let us know when you've opened it up and we'd be happy to help test and iterate.