These are chat archives for frictionlessdata/chat

Sep 2018
Rufus Pollock
Sep 24 2018 07:35 UTC

@zaneselvans @aborruso you seem to have found a bug here with the lack of dialect.

Generally, i want to ask a question about how the packages fit together. I've got a preference for having more of a toolkit where you use the tools to create your data package as needed.

I'm mentioning this here because i would guess we've either got a bug in the infer tool or in the combining of that output into the data package. At the moment it's tough to tell which.

I think it would be more transparent to the user to have a set up where you do:

schema = infer('my.csv')   // a simple dictionary
resource = new Resource()
resource.schema = schema
dataset = new Datasset()

More on this here

@aborruso Sorry, I have checked the codebase and it seems there is no dialect inference implemented for now. I mean underlying libraries guess and use dialect internally but don't provide this information to higher levels. So we probably have to implement it starting from tabulator. For now, I could recommend to use Python builtin csv.Sniffer as a workaround - E.g. dialect = csv.Sniffer().sniff( Also as an option (it was intended behavior for the infer function on this iteration) there could be a manual step adding resource.dialect by hands.
Andrea Borruso
Sep 24 2018 09:03 UTC

@roll thank you. To know and also declare the CSV separator in my opinion it is very important. It's like encoding: if the final user does not know it, he will lose time. Sometime a lot of time.
I think that the separator (for CSV files) should be always in the datapackage info.

It's a feature request

Thank you again to all you

Andrea Borruso
Sep 24 2018 09:33 UTC

@roll if I run

from datapackage import Resource
resource = Resource({u'path': 'input.csv'})
dialect = csv.Sniffer().sniff(

I have

TypeError                                 Traceback (most recent call last)
<ipython-input-34-735638ac6f72> in <module>()
----> 1 dialect = csv.Sniffer().sniff(

/usr/lib/python2.7/csv.pyc in sniff(self, sample, delimiters)
    181         quotechar, doublequote, delimiter, skipinitialspace = \
--> 182                    self._guess_quote_and_delimiter(sample, delimiters)
    183         if not delimiter:
    184             delimiter, skipinitialspace = self._guess_delimiter(sample,

/usr/lib/python2.7/csv.pyc in _guess_quote_and_delimiter(self, data, delimiters)
    221                       '(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?:$|\n)'):                            #  ".*?" (no delim, no space)
    222             regexp = re.compile(restr, re.DOTALL | re.MULTILINE)
--> 223             matches = regexp.findall(data)
    224             if matches:
    225                 break

TypeError: expected string or buffer

What's wrong in my code?

A temporary way has been the code below, but I would like to use your code

import csv
with open('input.csv', 'rb') as csvfile:
    temp_lines = csvfile.readline() + '\n' + csvfile.readline()
    dialect = csv.Sniffer().sniff(temp_lines, delimiters=',\t;|')
Sep 24 2018 10:12 UTC
Save the Date! csv,conf,v4 is happening! We will be heading back to the Eliot Centre in Portland on May 8-9 next year for more talks about data sharing and data analysis from science, journalism, government, and open source. more announcements in the next few weeks
Sign up to Slack for the latest updates and
@aborruso Ahh my bad it should be resource.raw_read (without the limit argument I think)
Andrea Borruso
Sep 24 2018 12:27 UTC
Ok @roll I will try, thank you
Andrea Borruso
Sep 24 2018 12:53 UTC

@aborruso Ahh my bad it should be resource.raw_read (without the limit argument I think)

It works perfectly, thank you

import csv
from datapackage import Resource
resource = Resource({u'path': 'input.csv'})
dialect = csv.Sniffer().sniff(resource.raw_read())
Andrea Borruso
Sep 24 2018 13:02 UTC
@roll what about to add ass feature also the delimiter in the managed infos?
Robert Gieseke
Sep 24 2018 14:42 UTC
Hi all, the latest release of the Pandas Datapackage reader can now also read GeoJSON into GeoPandas DataFrames:
David Cottrell
Sep 24 2018 14:49 UTC
Asking a question again, sorry if I missed an answer in the rolling chat. Where should datapackage "getters" go for non-local data? Is this a datapipeline or is there some feature of datapackages themselves I have missed? For example, wget <url> plus a schema, delimiter etc.