These are chat archives for frictionlessdata/chat

23rd
Sep 2018
Andrea Borruso
@aborruso
Sep 23 2018 07:13

@zaneselvans once again thank you, but I'm not able to proceed. Could you make a test with my input data and verify that it's possible to read the inferred delimiter?

My CSV input file is

city;location
london;"51.50,-0.11"
paris;"48.85,2.30"
rome;"41.89,12.51"

I have used datapackage-py with

package = Package()
package.infer('input.csv')

Inside my resource package.descriptor['resources'][0] I have no dialect and than no delimiter.

{u'encoding': u'utf-8',
 u'format': u'csv',
 u'mediatype': u'text/csv',
 u'name': u'input',
 u'path': u'input.csv',
 u'profile': u'tabular-data-resource',
 u'schema': {u'fields': [{u'format': u'default',
    u'name': u'city',
    u'type': u'string'},
   {u'format': u'default', u'name': u'location', u'type': u'geopoint'}],
  u'missingValues': [u'']}}
Andrea Borruso
@aborruso
Sep 23 2018 07:54

@zaneselvans and using Resource it's the same

from datapackage import Resource
resource = Resource({u'path': 'input.csv'})
resource.infer()

This gives me the json below without any dialect

{u'encoding': u'utf-8',
 u'format': 'csv',
 u'mediatype': u'text/csv',
 u'name': 'input',
 u'path': 'input.csv',
 u'profile': u'tabular-data-resource',
 u'schema': {u'fields': [{u'format': u'default',
    u'name': u'city',
    u'type': u'string'},
   {u'format': u'default', u'name': u'location', u'type': u'geopoint'}],
  u'missingValues': [u'']}}
Zane Selvans
@zaneselvans
Sep 23 2018 08:25
I'm getting the same result. That seems weird. @roll is there some reason why the CSV dialect ought not to be inferred as a part of the Resource descriptor? Especially if it includes a non-standard delimiter?
Zane Selvans
@zaneselvans
Sep 23 2018 09:51
Hmm, the tabular data resource spec indicates that the dialect property is mandatory if the CSV file differs from the format specified in RFC 4180 in any way, which a file with a non-comma delimiter does. So it seems like this is a bug.
Andrea Borruso
@aborruso
Sep 23 2018 10:32

@zaneselvans thank you.

However I have a generic goal: I know that datapackage-py (or goodtables-py) is able to infer generically CSV separator. I want to read it for a group of files and create a "separators report" for this list of files, and than datapackage-py seems my tool, the right tool to do it. But how to read this inferred info?

I have opened some months ago 2 related issues for goodtables: frictionlessdata/goodtables-py#262 and frictionlessdata/goodtables-py#270