These are chat archives for frictionlessdata/chat

29th
Apr 2016
Rufus Pollock
@rufuspollock
Apr 29 2016 08:56 UTC
Morning all - making progress on various items. Old data.okfn.org repo now at https://github.com/frictionlessdata/ideas
:+1:
Rufus Pollock
@rufuspollock
Apr 29 2016 09:00 UTC
Next step is to clean up that issue tracker - moving data.okfn.org website stuff to https://github.com/okfn/data.okfn.org-new/issues and closed in ideas tracker as duplicate
Rufus Pollock
@rufuspollock
Apr 29 2016 10:39 UTC
do people think it would be good to have a full catalog of tools on the site? Current discussion here: frictionlessdata/frictionlessdata.io#5
Gustavo Silva
@gsilvapt
Apr 29 2016 11:00 UTC
A catalog and instructions, definitely :+1:
Rufus Pollock
@rufuspollock
Apr 29 2016 11:12 UTC
Overall sprint description (getting updated): frictionlessdata/frictionlessdata.io#3
Gustavo Silva
@gsilvapt
Apr 29 2016 12:01 UTC
We're having an issue here with those historic datasets. So, you'd rather have it in one single datapackage? GDPs can be done, but Population numbers see to be off setting the data package. I just want to confirm this because if I have to merge both tabs, I need to work a bit on the original file.
Also, countries have weird symbols and other abbrevations. The source does not explain them. Should I just remove the conflicting characters or use other terminology like, say, World Bank's country codes?
Changing the data does not seem adequate...
Rufus Pollock
@rufuspollock
Apr 29 2016 12:02 UTC
@gsilvapt can you link to the issue in the core datasets registry for this data package.
Why would you have to merge tabs - this would be about having two different csv files in one data package, no?
Gustavo Silva
@gsilvapt
Apr 29 2016 12:05 UTC
Is that possible? I didn't know that
Rufus Pollock
@rufuspollock
Apr 29 2016 12:05 UTC
@gsilvapt yes multiple files in a data package is totally allowed :-) - something to add to the FAQ ;-)
Gustavo Silva
@gsilvapt
Apr 29 2016 12:05 UTC
This is the original discussion thread: datasets/registry#101
Which also shows the original slurce
Two possiblities: Create different csvs into one single data package (Rufus' suggestion) or Create something like this: Country,Year,Historic GDP,GDP Per Capita,The third one, Austria,1,253,some number, other number,
The problem is that the source has GDP, GDP Per Capita and Population Numbers. The first two may combine, the last does not
@rgrp Is there any guide out there I can use to help me setup a data package with multiple csv files then?
Rufus Pollock
@rufuspollock
Apr 29 2016 12:08 UTC
@gsilvapt take a look at some of the existing datasets in github.com/datasets e.g. s-and-p-companies has two files i believe ..
Gustavo Silva
@gsilvapt
Apr 29 2016 12:08 UTC
Okay, that will do. Thanks and sorry for nagging so much with these two packages :sweat_smile:
Rufus Pollock
@rufuspollock
Apr 29 2016 12:08 UTC
@gsilvapt i would mirror the source data pretty close and actually have three files ...
@gsilvapt not nagging - asking questions is great :smile:
Gustavo Silva
@gsilvapt
Apr 29 2016 12:10 UTC
Okay, looking at https://github.com/datasets/s-and-p-500-companies helped me understand how it works. Now I'm feeling dumb :laughing:
Just going to have lunch and will take care of this soon :+1: Thanks for the help
Gustavo Silva
@gsilvapt
Apr 29 2016 12:21 UTC
Also, where can I edit the FAQ to add that few other details that I learned today about ensuring a data package's quality?
Rufus Pollock
@rufuspollock
Apr 29 2016 12:22 UTC
as a first pass can i suggest opening an issue on the core datasets registry and then we'll merge
Gustavo Silva
@gsilvapt
Apr 29 2016 12:24 UTC
I was hoping it was a GitHub repository that I could send a PR. If you guys agreed you could accept but that will work too
Rufus Pollock
@rufuspollock
Apr 29 2016 12:25 UTC
which faq do you want to update?
Gustavo Silva
@gsilvapt
Apr 29 2016 13:02 UTC
http://data.okfn.org/doc/core-data-curators <- This one. The part referring to "Preparing Datasets as Core Data Packages"
Rufus Pollock
@rufuspollock
Apr 29 2016 13:03 UTC
ok - please dive in. the repo is at https://github.com/okfn/data.okfn.org-new/
Gustavo Silva
@gsilvapt
Apr 29 2016 13:04 UTC
Ah, great! Thanks!
Rufus Pollock
@rufuspollock
Apr 29 2016 13:49 UTC
@roll i have some questions re https://github.com/frictionlessdata/jsontableschema-sql-py and bigquery. Do we have code that actually takes a data package rather than a JTS + CSV or are you supposed to that yourself by hand?
Gustavo Silva
@gsilvapt
Apr 29 2016 14:23 UTC
I am having an issue with this data package. If anyone could drop by and see if you know what is wrong, please let me know that I will fix it. https://github.com/gsilvapt/historical-gdp/issues/4#issuecomment-215732909
Thanks :+1:
Gustavo Silva
@gsilvapt
Apr 29 2016 14:31 UTC
Okay, definitely not a cookie issue - I just tried with another package and it worked. I must say then I am doing something wrong with the .json file
@rgrp here driver level operates with schema+data (python objects) - https://github.com/frictionlessdata/jsontableschema-sql-py#tabular-storage
here JTS level operates with schema+data (files) - https://github.com/frictionlessdata/jsontableschema-py#sql
when JTS and DP levels are tiny driver wrappers
Rufus Pollock
@rufuspollock
Apr 29 2016 15:16 UTC
@roll do you have any working code snippets? perhaps in gists? i.e. examples with real code or similar ... (/cc @danfowler - super useful for our tutorials)
also those links have working snippets
here is a really real code - https://github.com/opentrials/processors/blob/master/exporter/translators/openaire.py - nothing more than pull/push_datapackage call
Rufus Pollock
@rufuspollock
Apr 29 2016 15:20 UTC
@roll i was hoping for something super simple ;-) but that's useful
Simple snippet:
# pip install sqlalchemy git+git://github.com/frictionlessdata/datapackage-py jsontableschema-sql
import sqlalchemy as sa
from datapackage import push_datapackage

# Engine
engine = sa.create_engine('sqlite:///:memory:')

# Import
push_datapackage(
    descriptor='https://github.com/datasets/country-codes/archive/master.zip', 
    backend='sql', engine=engine)

# Check
print(list(engine.execute('select * from data__country_codes')))
I really hope we will stop soon to write this git+git.... to install datapackage-py ..
Rufus Pollock
@rufuspollock
Apr 29 2016 16:03 UTC
@roll you support zip files ;-) - i have to say might we simpler if this was the plan url as the zip stuff is an extension ...
oh sorry it could be just https://raw.githubusercontent.com/datasets/country-codes/master/datapackage.json instead of zip link
also of course local path could be used instead a link
Rufus Pollock
@rufuspollock
Apr 29 2016 16:24 UTC
can you just pass the standard github path and it works too?
As far as I know it's not supported (checked - error) (cc @vitorbaptista)
Rufus Pollock
@rufuspollock
Apr 29 2016 16:33 UTC
@roll the thing we probably want in python is http://dataprotocols.org/data-package-identifier/
@rgrp frictionlessdata/datapackage-py#74 ?
Rufus Pollock
@rufuspollock
Apr 29 2016 16:43 UTC
:thumbsup: