Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 29 00:05
    sglavoie edited #536
  • Oct 26 15:46
    sglavoie opened #539
  • Oct 26 13:52
    rufuspollock opened #538
  • Oct 23 17:39
    lwinfree assigned #537
  • Oct 23 17:39
    lwinfree commented #537
  • Oct 23 17:38
    lwinfree commented #537
  • Oct 23 17:32
    lwinfree opened #537
  • Oct 22 20:21
    lwinfree commented #536
  • Oct 22 17:23
    sglavoie edited #536
  • Oct 22 17:20
    sglavoie edited #536
  • Oct 22 17:17
    sglavoie edited #536
  • Oct 22 17:16
    sglavoie commented #536
  • Oct 22 17:11
    sglavoie edited #536
  • Oct 22 12:48
    sglavoie edited #536
  • Oct 21 12:16
    costas80 opened #711
  • Oct 19 12:22
    sglavoie edited #536
  • Oct 14 10:16
    roll edited #709
  • Oct 14 10:16
    roll edited #709
  • Oct 14 09:59

    rufuspollock on master

    [rfcs/0004][m]: File => Resourc… (compare)

  • Oct 14 09:32
    rufuspollock opened #710
Diego Díez Ricondo
@didiez

Our schema allows extra headers and missing non-required headers, so the csv could have less (or more) column than the defined schema.

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    report = validate("datapackage.json", checks=["structure", "schema", "foreign-key"], order_fields=True, infer_fields=False)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/validate.py", line 80, in validate
    report = inspector.inspect(source, **options)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/inspector.py", line 82, in inspect
    table_warnings, table_report = task.get()
  File "/home/didiez/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/home/didiez/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/inspector.py", line 200, in __inspect_table
    success = prepare_func(stream, schema, extra)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/contrib/checks/foreign_key.py", line 48, in prepare
    current_resource_name=extra['resource-name'])
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/contrib/checks/foreign_key.py", line 116, in _get_relations
    relations[resource_name] = resource.read(keyed=True)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/datapackage/resource.py", line 377, in read
    foreign_keys_values=foreign_keys_values, **options)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/table.py", line 353, in read
    for count, row in enumerate(rows, start=1):
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/table.py", line 215, in iter
    for row_number, headers, row in iterator:
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/table.py", line 509, in builtin_processor
    row, row_number=row_number, exc_handler=exc_handler)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/schema.py", line 266, in cast_row
    error_data=keyed_row)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/helpers.py", line 90, in default_exc_handler
    raise exc
datapackage.exceptions.CastError: Row length 5 doesn't match fields count 6 for row "2"

If we only check 'structure' and 'schema' everything works as expected.
I could not find any reference in the docs about reading (and casting) csv files with fields not defined in the schema o missing fields (non-required column in schema).
Should I open an issue or it's a known limitation when reading csv files with goodtables?

roll
@roll
@didiez Please open an issue. Goodtables should never fail anyway
roll
@roll
The problem that current implementation assumes that the reference table is valid. Probably we need to emit a foreign-key error here saying that the reference table is not valid (instead of failing)
Oleg Lavrovsky
@loleg
^ in case anyone here knows Workbench, please feel free to comment
Diego Díez Ricondo
@didiez
@roll issue added frictionlessdata/goodtables-py#347. As a workaround I ended up pre-processing the source csv with pandas to add/remove/reorder columns to make it valid against the schema
Matt Melton
@mattmelton
hi - I'm having issues with goodtables validate on a file-like object, specificly a simple file IO via open("test.csv", "r")... validate('test.csv') works but validate(fileHandle) doesn't. The error I get is: "No such file or directory: 'inline'"
Egwuenu Gift
@lauragift21
Hi everyone, I'm happy to share that I recently joined the Frictionless Data team as a Developer Evangelist. My role involves spreading the word about Frictionless Data and encouraging community involvement. I'm always open to help and have discussions about Frictionless Data. https://www.datopian.com/blog/2020/03/20/joining-the-frictionless-data-team/
Matt Melton
@mattmelton
the error No such file or directory: 'inline' was masking the actual error. To read from a file stream it must be binary, you must specify the format and it cannot live in a tmp path. Unfortunately the logic behind _local_file_not_found inside inspector.py - return urlparse(source).scheme == '' and not os.path.isfile(source) - masks the true error.
roll
@roll
Hi @mattmelton could you please create an issue?
Matt Melton
@mattmelton
sure
Matt Melton
@mattmelton
@roll #349
Matt Melton
@mattmelton
does the python validator support the rdfType field? as far as I can tell it doesn't do anything
Egwuenu Gift
@lauragift21
Hi, @roll I'm having issues getting goodtables js library to work. I have a demo here https://repl.it/@lauragift21/goodtables-js and this is the error returned when trying to validate a CSV file. (node:66) UnhandledPromiseRejectionWarning: Error: Can't create a job on API. Reason: "Error: Request failed with status code 403"
roll
@roll
@mattmelton No it doesn't at the moment
@lauragift21 Please create an issue I'll investigate. goodtables-js is a goodtables.io wrapper so something wrong with the API endpoint
Egwuenu Gift
@lauragift21
Alright sure.
Matt Melton
@mattmelton
I'm trying to figure out a solution to validate the range of dates in a CSV. Unfortunately the Excel short date format varies by locale, ie: dd/mm/yyyy in most of the world and dd/mm/yyyy in the US and US-centric places like Abu Dhabi. I might have to do multiple validation passes - the first to see if the it's a well-formed date, second to peak and auto detect the format, and 3rd to validate the range i.e. [dd/mm/yyyy, dd/mm/yyyy] or [mm/dd/yyyy, mm/dd/yyyy]. Has anyone attempted to solve a similar problem?
Mel Fagundes
@melffagundes
Hi, I can't access field guide page. Anyone has same problem?
Lilly Winfree
@lwinfree
Hi @melffagundes! We just released a new version of the Frictionless Data website, so there are some link redirects that are still in progress. For the new website, the field guide sections have been separated into individual blogs. We are working on creating a new guide, which is here (as a work in progress) https://frictionlessdata.io/guide/.
The previous field guide content can be found in these separate links for the time being:
Let us know if you can’t find other documents or if you need help! thanks!
Lilly Winfree
@lwinfree
You can also view it at http://old.frictionlessdata.io/field-guide/ for the time being
Emerson Rocha
@fititnt

I guess I found a bug on the DataPackage Viewer (https://data.okfn.org/tools/view), but not sure where to report it. With GitHub repo issue should I go?

While the specification allows the resource path to either be a file on local disk or a remote resource (and most implementations seems to check if is remote resource if path start with 'http'), the DataPackage Viewer on the website don't do this check and assume is always a local file.

Here an example where it fails https://data.okfn.org/tools/view?url=https%3A%2F%2Fgithub.com%2Fcovid-taskforce-cplp%2Fdados-v1, The download path for 'covid-casos-brasil' should be 'https://brasil.io/dataset/covid19/caso?format=csv', but the download button shows https://raw.github.com/covid-taskforce-cplp/dados-v1/master/https://brasil.io/dataset/covid19/caso?format=csv

Egwuenu Gift
@lauragift21
Hi @fititnt You create an issue for this in the forum http://github.com/frictionlessdata/forum
Rufus Pollock
@rufuspollock

@slythfox

Is there a preferred copyright notice for FD contributions? I’ve seen all across the board for different DataPackage/TableSchema implementations

Can you open an issue in the forum https://github.com/frictionlessdata/forum/issues - and we can answer there.

Hi everyone, I'm happy to share that I recently joined the Frictionless Data team as a Developer Evangelist. My role involves spreading the word about Frictionless Data and encouraging community involvement. I'm always open to help and have discussions about Frictionless Data. https://www.datopian.com/blog/2020/03/20/joining-the-frictionless-data-team/

@lauragift21 a big welcome!

Michael Amadi
@michaelamadi
New website looks pretty good! :clap: :+1:
Michael Amadi
@michaelamadi
Hi @lwinfree, I can't see community tooling contributions or case studies on the new site. Do these have a new home?
Lilly Winfree
@lwinfree
Hi @michaelamadi! Good catch. The plan is to have them here in the near future: https://frictionlessdata.io/tooling/labs Please continue to give us your thoughts/ask questions! We are planning on iterating over this release. Thanks!
Michael Amadi
@michaelamadi
@lwinfree Makes sense :)
Emerson Rocha
@fititnt
@lauragift21 thank you!
roll
@roll
Hi people! sorry if some of you got noisy close/open messages from the FD issue trackers. I had to sync our issues with an external system and it was the only way
Rufus Pollock
@rufuspollock

Trying out Discord (as replacement for gitter and this channel) ☎️

We are thinking of moving off gitter - see frictionlessdata/forum#13

We think Discord could be good and we are trying it out now - if you'd like to join the new channel you can hop on via this link:

https://discord.gg/2UgfM2k

📦🚀

Rufus

Egwuenu Gift
@lauragift21

Hi, I'm excited to share we'll be hosting a Frictionless Data virtual hangout on 20th April. 5 PM (CET) This is a great opportunity to know what's been going on in the Frictionless Data community. Can't wait to see you there! :slight_smile:
https://frictionlessdata.io/blog/2020/04/16/annoucing-frictionless-data-virtual-hangout/

Link to register:
https://zoom.us/meeting/register/tJEqdOyspzgvG9wlVM_3Z_6yyL8wzc-v03Bq

Mel Fagundes
@melffagundes
Hi, everyone! Was the Frictionless Data Virtual Community Hangout recorded?
Lilly Winfree
@lwinfree
Hi @melffagundes yes! Gift is writing a blog post about the event that will include the video link I think
Egwuenu Gift
@lauragift21
Hi, @melffagundes I'll share a post and recording here when it's get published this week.
Egwuenu Gift
@lauragift21

Hi everyone! Here's a recap blogpost from the frictionless data community hangout.

https://frictionlessdata.io/blog/2020/04/28/recap-post-frictionless-data-hangout-april-2020/

We'll be hosting another in May. Link to register: https://us02web.zoom.us/meeting/register/tZMsf-qrrjopHtGZwMyM7tCmp_YyPlNms6wK

Look forward to seeing you there.

Rufus Pollock
@rufuspollock

Come and join discord https://discord.gg/2UgfM2k

Hi all, we are using discord quite a bit and it seems to be working well. Please come on over 😄
Brian Mc Donald
@brianmcdonald
Hi all. i have perhaps a silly question. In a table schema datapackage.json is it possible to set constraint such as "column A must equal the sum of column B, C and D"?
Lilly Winfree
@lwinfree
Hi @brianmcdonald ! We are not using this Gitter chat anymore and instead are over in Discord. But, to answer your question, please check this out: https://specs.frictionlessdata.io/table-schema/#constraints If you have other questions, please ask them in Discord (https://discord.gg/2UgfM2k) :-)
Brian Mc Donald
@brianmcdonald

Thanks Lily. I tried logging in to the discord chat but it forces me to verify by providing my phone number (or install an android app if I use my phone). I don't like having such personal data/privacy trade-offs so unfortunately the Discord room is not an option for me.

Reading a bit more I think table schema constraints can only be applied to individual constraints but goodtables can provide the check I'm looking for.

Lilly Winfree
@lwinfree
Hi @brianmcdonald that is good feedback. I still monitor this chat, so if you need help you can use this & you will (perhaps slowly) get an answer :-) You can also open an issue on GitHub or email us too for help!
Rufus Pollock
@rufuspollock

Thanks Lily. I tried logging in to the discord chat but it forces me to verify by providing my phone number (or install an android app if I use my phone). I don't like having such personal data/privacy trade-offs so unfortunately the Discord room is not an option for me.

@brianmcdonald i was able to sign up via web w/o providing phone info at all ...

ssowj
@ssowj
hello!I'm looking to validate a csv file against Oracle table schema which in JSON is {"uniq_id": "NUMBER(38, 0)", "lat": "NUMBER(7, 5)", "lon": "NUMBER(8, 5)", "orig_val": "NUMBER(9, 3)", "orig_type": "VARCHAR(1)"}. Do I have to modify this json to use it with the library?
ssowj
@ssowj
I'm lost on how to define the table schema to validate against my csv
Lilly Winfree
@lwinfree
hi @ssowj! We no longer use this chat. We are now using Discord: https://discord.gg/2UgfM2k. Can you please post your question there? thanks!
ssowj
@ssowj
Thank you!