Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Mar 30 18:43
    lauragift21 edited #432
  • Mar 30 18:43
    lauragift21 edited #432
  • Mar 30 18:41

    nirabpudasaini on gh-pages

    Deploying to gh-pages from - 9… (compare)

  • Mar 30 18:38

    lauragift21 on back-to-own-subsite-2020

    fix missing table schema link (compare)

  • Mar 30 13:01
    lauragift21 commented #432
  • Mar 30 12:47
    roll commented #432
  • Mar 30 12:33
    roll commented #432
  • Mar 30 12:08
    amercader commented #432
  • Mar 30 12:00
    lauragift21 commented #432
  • Mar 30 11:58
    lauragift21 edited #432
  • Mar 30 11:52
    lauragift21 edited #432
  • Mar 30 11:52
    lauragift21 edited #432
  • Mar 30 11:24
    amercader commented #432
  • Mar 30 11:15
    amercader edited #432
  • Mar 30 11:15
    amercader edited #432
  • Mar 30 11:07
    lauragift21 edited #446
  • Mar 30 11:06
    lauragift21 edited #446
  • Mar 30 08:50
    lauragift21 assigned #444
  • Mar 30 08:22
    lauragift21 commented #441
  • Mar 30 08:18
    lauragift21 edited #441
roll
@roll
It will generate the specs
Grunt is not needed
Rufus Pollock
@rufuspollock

📰 Want to track progress on the big FD v2 update? See this issue …

frictionlessdata/project#415

We’re making great progress in our sprint this week and here’s the current high level status

  • [x] Set up vuepress based site #416
  • [ ] (70%) Migrate content #417
  • [ ] (80%) Write and add new content #411
  • [ ] (60%) Theme it ... #402
  • [ ] Better Search (minor)
  • [ ] Switch over DNS
Augusto Herrmann
@augusto-herrmann
Hi. So I figure from #417 that the next step is to migrate the articles (#409), right?
Augusto Herrmann
@augusto-herrmann
@rufuspollock closed #409 as a duplicate of #217, but I think you meant some other issue instead
Rufus Pollock
@rufuspollock
@augusto-herrmann correct and fixed. 417 is the focus - though we are nearing completion. Maybe best item would be to look at frictionlessdata/forum#5 and see if you can liase with @johanricher on a plan of work there ...
Paul Girard
@paulgirard
@rufuspollock Got it thx!
Paul Girard
@paulgirard
I've just added a three-solution proposal frictionlessdata/forum#1 Let me know.
Philip Durbin
@pdurbin
@lwinfree (and others), hi! Great talk at https://fosdem.org/2020/schedule/event/open_research_frictionless_data/ ! I believe you mentioned an integration with Zenodo but I'm having trouble finding any information about this. Can someone please drop a link in here?
Lilly Winfree
@lwinfree
Hi @pdurbin! For Zenodo, I was refering to our recent Pilot collaboration with Catalyst Co-op, where they packaged US energy data into datapackages & then uploaded to Zenodo. I think this issue is the best place to read a summary: catalyst-cooperative/pudl#425 See also: https://github.com/catalyst-cooperative/pudl/blob/master/src/pudl/load/metadata.py & https://github.com/catalyst-cooperative/pudl/blob/master/docs/datapackages.rst for more info. Let me know if you have other questions, or if you’d like to talk to the Catalyst team!
Philip Durbin
@pdurbin
Thanks! This is making more sense now. I was pretty confused, thinking that Frictionless Data has a server-side component that transfers files from a Frictionless Data server to Zenodo.
Lilly Winfree
@lwinfree
Nope! That would be interesting though
Philip Durbin
@pdurbin
:)
Jarek Skrzypek
@jareks
I am looking for a way to create and read dynamic tables with sqlalchemy in webapp. It seems like I could use tableschema-sql to create a table and next sqlalchemy automap to use it. Or is there some other recommended approach?
slythfox
@slythfox
Hey, so it's been a long time since I've checked in. I wrote a partial Swift language implementation of [Tabular] Data Package and Table Schema for my own project (which is used in the shipped Quotemarks iOS app). It is unlikely I'll have the time/energy to complete the entire specification (e.g. zipping, data inference, every single data type). I haven't yet open sourced it since it's incomplete. If someone on here wants to complete the Swift work, reach out to me. Implementation status is here: frictionlessdata/software#29
I did write a dialectal CSV parser in Swift since one did not exist already. That's open sourced here: https://github.com/slythfox/csv-dialect-swift
roll
@roll
@jareks Hi, yes, tableschema-sql should be the right tool for the job
@slythfox It would be great to have it in the implementations family. I think at some point we will find how to finish it cc @lwinfree
Philip Durbin
@pdurbin
Frictionless Data keeps coming up in my world. Just now: https://github.com/IQSS/dataverse/issues/6678#issuecomment-591564656
JD Bothma
@jbothma
anyone here who can poke someone who works with OpenSpending? I'm trying to upload a dataset quite urgently and I'm getting an S3 error from the data loader I think
JD Bothma
@jbothma
never mind - it was an issue with parens or plusses in the filename
Rufus Pollock
@rufuspollock

Frictionless Data keeps coming up in my world. Just now: https://github.com/IQSS/dataverse/issues/6678#issuecomment-591564656

😄is there anything we can contribute to that discussion?

Paul Girard
@paulgirard
@roll @rufuspollock I opened an issue a month ago about multipart header frictionlessdata/datapackage-py#256
I have a commit ready for review in there, should I create a PR about this ?
roll
@roll
@paulgirard Sure Let's try
Paul Girard
@paulgirard
ok I'll update my fork wiht new commits to avoid merge issues and submit a PR
roll
@roll
Great. Thanks!
Philip Durbin
@pdurbin
@lwinfree et al., do you want in on https://github.com/researchdatamaniacs ?
Philip Durbin
@pdurbin
I could use some help with the charter: researchdatamaniacs/crazyideas#3
Diego Díez Ricondo
@didiez
Hi all, I couldn't find in the docs a way to get a report generated by goodtables.validate(..) including the actual row data (tables.errors.row), I can only retrieve the row-number (tables.errors.row-number). Am I missing some config/flag to enable this behaviour? http://try.goodtables.io/ is showing the actual cell values. Many thanks and great job!
Rufus Pollock
@rufuspollock
@didiez thanks for asking - could you open an issue in the forum with details and we’ll try to debug / answer there https://github.com/frictionlessdata/forum/issues
slythfox
@slythfox
Is there a preferred copyright notice for FD contributions? I’ve seen all across the board for different DataPackage/TableSchema implementations
Diego Díez Ricondo
@didiez
we are trying to validate two csv files with foreign keys between them, but when activating 'foreign-key' check on validate(..) thrown an exception
Diego Díez Ricondo
@didiez

Our schema allows extra headers and missing non-required headers, so the csv could have less (or more) column than the defined schema.

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    report = validate("datapackage.json", checks=["structure", "schema", "foreign-key"], order_fields=True, infer_fields=False)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/validate.py", line 80, in validate
    report = inspector.inspect(source, **options)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/inspector.py", line 82, in inspect
    table_warnings, table_report = task.get()
  File "/home/didiez/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/home/didiez/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/inspector.py", line 200, in __inspect_table
    success = prepare_func(stream, schema, extra)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/contrib/checks/foreign_key.py", line 48, in prepare
    current_resource_name=extra['resource-name'])
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/goodtables/contrib/checks/foreign_key.py", line 116, in _get_relations
    relations[resource_name] = resource.read(keyed=True)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/datapackage/resource.py", line 377, in read
    foreign_keys_values=foreign_keys_values, **options)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/table.py", line 353, in read
    for count, row in enumerate(rows, start=1):
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/table.py", line 215, in iter
    for row_number, headers, row in iterator:
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/table.py", line 509, in builtin_processor
    row, row_number=row_number, exc_handler=exc_handler)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/schema.py", line 266, in cast_row
    error_data=keyed_row)
  File "/home/didiez/anaconda3/lib/python3.7/site-packages/tableschema/helpers.py", line 90, in default_exc_handler
    raise exc
datapackage.exceptions.CastError: Row length 5 doesn't match fields count 6 for row "2"

If we only check 'structure' and 'schema' everything works as expected.
I could not find any reference in the docs about reading (and casting) csv files with fields not defined in the schema o missing fields (non-required column in schema).
Should I open an issue or it's a known limitation when reading csv files with goodtables?

roll
@roll
@didiez Please open an issue. Goodtables should never fail anyway
roll
@roll
The problem that current implementation assumes that the reference table is valid. Probably we need to emit a foreign-key error here saying that the reference table is not valid (instead of failing)
Oleg Lavrovsky
@loleg
^ in case anyone here knows Workbench, please feel free to comment
Diego Díez Ricondo
@didiez
@roll issue added frictionlessdata/goodtables-py#347. As a workaround I ended up pre-processing the source csv with pandas to add/remove/reorder columns to make it valid against the schema
Matt Melton
@mattmelton
hi - I'm having issues with goodtables validate on a file-like object, specificly a simple file IO via open("test.csv", "r")... validate('test.csv') works but validate(fileHandle) doesn't. The error I get is: "No such file or directory: 'inline'"
Egwuenu Gift
@lauragift21
Hi everyone, I'm happy to share that I recently joined the Frictionless Data team as a Developer Evangelist. My role involves spreading the word about Frictionless Data and encouraging community involvement. I'm always open to help and have discussions about Frictionless Data. https://www.datopian.com/blog/2020/03/20/joining-the-frictionless-data-team/
Matt Melton
@mattmelton
the error No such file or directory: 'inline' was masking the actual error. To read from a file stream it must be binary, you must specify the format and it cannot live in a tmp path. Unfortunately the logic behind _local_file_not_found inside inspector.py - return urlparse(source).scheme == '' and not os.path.isfile(source) - masks the true error.
roll
@roll
Hi @mattmelton could you please create an issue?
Matt Melton
@mattmelton
sure
Matt Melton
@mattmelton
@roll #349
Matt Melton
@mattmelton
does the python validator support the rdfType field? as far as I can tell it doesn't do anything
Egwuenu Gift
@lauragift21
Hi, @roll I'm having issues getting goodtables js library to work. I have a demo here https://repl.it/@lauragift21/goodtables-js and this is the error returned when trying to validate a CSV file. (node:66) UnhandledPromiseRejectionWarning: Error: Can't create a job on API. Reason: "Error: Request failed with status code 403"
roll
@roll
@mattmelton No it doesn't at the moment
@lauragift21 Please create an issue I'll investigate. goodtables-js is a goodtables.io wrapper so something wrong with the API endpoint
Egwuenu Gift
@lauragift21
Alright sure.
Matt Melton
@mattmelton
I'm trying to figure out a solution to validate the range of dates in a CSV. Unfortunately the Excel short date format varies by locale, ie: dd/mm/yyyy in most of the world and dd/mm/yyyy in the US and US-centric places like Abu Dhabi. I might have to do multiple validation passes - the first to see if the it's a well-formed date, second to peak and auto detect the format, and 3rd to validate the range i.e. [dd/mm/yyyy, dd/mm/yyyy] or [mm/dd/yyyy, mm/dd/yyyy]. Has anyone attempted to solve a similar problem?