These are chat archives for frictionlessdata/chat

16th
Mar 2016
roll
@roll
Mar 16 2016 09:32
:+1:
Vitor Baptista
@vitorbaptista
Mar 16 2016 10:22
@jamiekt tox uses its own environment, saving everything inside the .tox folder in the project directory. To make it download the dependencies again, you can remove this folder.
Or, better yet, simply run tox --recreate and it'll do it himself
Thanks for the tox output, it's very useful. I see there're many places where our code fails in Windows :cry: We definitely need to set up appveyor
Jamie Thomson
@jamiekt
Mar 16 2016 10:38
:+1:
Thanks again @vitorbaptista. Its working for me, that's the most important thing.
Background to this (in case anyone is interested) is that I'm development lead on a hadoop-based platform that (from a 10000ft view) aggregates sales data to make it useful for the purposes of "data science". We don't (yet) want the data scientists in our organisation to have free rein to query our hadoop clusters so we've built a tool to output data from hadoop (more specifically, impala) into a tabular datapackage.
I do intend to open source that tool, but there's some work that needs to happen first:
  • tidy up the code
  • decouple the generic, sharable part of the tool from the proprietary part that can't be shared. Currently its all jumbled up into a bit of a hot mess so we have some work to unpick it.
Paul Walsh
@pwalsh
Mar 16 2016 10:41
@jamiekt that sounds great! We’ll do everything we can to help. Looks like there might be a blog post in there too! We’d love to share your use case(s) with the wider community.
Vitor Baptista
@vitorbaptista
Mar 16 2016 10:43
Cool, @jamiekt! Does your platform has any website I could see?
Jamie Thomson
@jamiekt
Mar 16 2016 10:48
@vitorbaptista nope, its wholly internal right now. we have future plans to make it available in a SaaS model but that's not our priority right now
Vitor Baptista
@vitorbaptista
Mar 16 2016 10:58
Fair enough. It's great to see the datapackages used in the wild, though. If you have any comment/suggestion/problem, please come and tell us :)
Jamie Thomson
@jamiekt
Mar 16 2016 11:25
you know I will :)

I do have one question actually. Our tool:

  1. produces all the CSV files & datapackage.json then
  2. zips them all up.

Is the "zipping up" considered...oh I dunno...normal? Does the datapackage spec dictate that all those files should be zipped up? Obviously zipping them up is done for convenience (easy to move them around) but what does the spec say about that?

Paul Walsh
@pwalsh
Mar 16 2016 11:31
the datapackage-py has a convenience method for zip
it is not addressed in the spec
but it is obviously handy :)
Vitor Baptista
@vitorbaptista
Mar 16 2016 11:31
There isn't a spec about compressing datapackages yet. In datapackage-py, I've implemented a save() method which generates a .zip file with the datapackage.json and the data files. It's not part of the spec yet, though. It's being discussed in dataprotocols/dataprotocols#132
Jamie Thomson
@jamiekt
Mar 16 2016 11:34
got it, thanks guys
Vitor Baptista
@vitorbaptista
Mar 16 2016 11:34
I would follow the pattern generated by GitHub (like https://github.com/datasets/gdp/archive/master.zip), as this is a use case we will certainly support whenever the spec is written.
Jamie Thomson
@jamiekt
Mar 16 2016 11:35
:+1:
Daniel Fowler
@danfowler
Mar 16 2016 17:17
I think we can move forward on addressing this in the spec, as we essentially have everything we need now (a working implementation, broad support behind the idea, explicit external use cases) :pencil: cc: @rgrp