@zaneselvans @aborruso you seem to have found a bug here with the lack of dialect.
Generally, i want to ask a question about how the packages fit together. I've got a preference for having more of a toolkit where you use the tools to create your data package as needed.
I'm mentioning this here because i would guess we've either got a bug in the infer tool or in the combining of that output into the data package. At the moment it's tough to tell which.
I think it would be more transparent to the user to have a set up where you do:
schema = infer('my.csv') // a simple dictionary
resource = new Resource()
resource.schema = schema
dataset = new Datasset()
dataset.addResource(resource)
More on this here http://okfnlabs.org/blog/2018/02/15/design-pattern-for-a-core-data-library.html
tabulator
. For now, I could recommend to use Python builtin csv.Sniffer as a workaround - https://github.com/frictionlessdata/tabulator-py/blob/master/tabulator/parsers/csv.py#L102. E.g. dialect = csv.Sniffer().sniff(resource.read(limit=100))
. Also as an option (it was intended behavior for the infer
function on this iteration) there could be a manual step adding resource.dialect
by hands.
@roll thank you. To know and also declare the CSV separator in my opinion it is very important. It's like encoding: if the final user does not know it, he will lose time. Sometime a lot of time.
I think that the separator (for CSV files) should be always in the datapackage info.
It's a feature request
Thank you again to all you
@roll if I run
from datapackage import Resource
resource = Resource({u'path': 'input.csv'})
dialect = csv.Sniffer().sniff(resource.read(limit=100))
I have
TypeError Traceback (most recent call last)
<ipython-input-34-735638ac6f72> in <module>()
----> 1 dialect = csv.Sniffer().sniff(resource.read(limit=100))
/usr/lib/python2.7/csv.pyc in sniff(self, sample, delimiters)
180
181 quotechar, doublequote, delimiter, skipinitialspace = \
--> 182 self._guess_quote_and_delimiter(sample, delimiters)
183 if not delimiter:
184 delimiter, skipinitialspace = self._guess_delimiter(sample,
/usr/lib/python2.7/csv.pyc in _guess_quote_and_delimiter(self, data, delimiters)
221 '(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?:$|\n)'): # ".*?" (no delim, no space)
222 regexp = re.compile(restr, re.DOTALL | re.MULTILINE)
--> 223 matches = regexp.findall(data)
224 if matches:
225 break
TypeError: expected string or buffer
What's wrong in my code?
A temporary way has been the code below, but I would like to use your code
import csv
with open('input.csv', 'rb') as csvfile:
temp_lines = csvfile.readline() + '\n' + csvfile.readline()
dialect = csv.Sniffer().sniff(temp_lines, delimiters=',\t;|')
dialect.delimiter
resource.raw_read
(without the limit argument I think)
@aborruso Ahh my bad it should be
resource.raw_read
(without the limit argument I think)
It works perfectly, thank you
import csv
from datapackage import Resource
resource = Resource({u'path': 'input.csv'})
dialect = csv.Sniffer().sniff(resource.raw_read())
dialect.delimiter