Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Dec 14 23:04
    rufuspollock commented #391
  • Dec 14 23:03
    rufuspollock commented #391
  • Dec 14 23:01
    rufuspollock commented #392
  • Dec 14 11:43
    monikappv commented #392
  • Dec 14 11:23
    monikappv edited #379
  • Dec 14 11:21
    monikappv edited #379
  • Dec 14 11:20
    monikappv commented #379
  • Dec 14 11:06
    monikappv edited #390
  • Dec 14 11:05
    monikappv edited #390
  • Dec 14 11:04
    monikappv edited #390
  • Dec 14 10:57
    monikappv commented #391
  • Dec 14 10:57
    monikappv edited #391
  • Dec 14 10:57
    monikappv edited #391
  • Dec 13 23:38
    rufuspollock commented #385
  • Dec 13 23:37
    rufuspollock assigned #385
  • Dec 13 23:37
    rufuspollock unassigned #385
  • Dec 13 23:25
    rufuspollock commented #390
  • Dec 13 23:20
    rufuspollock commented #392
  • Dec 13 23:16
    rufuspollock commented #379
  • Dec 13 23:16
    rufuspollock commented #379
Ethan Welty
@ezwelty
@roll OK, so it sounds like you are of the opinion that missing values should be ignored in uniqueness constraints. (I don't agree that this is clear from the specs, since "value" in "all values for that field MUST be unique" could be understood to include null values). tableschema-py does what you say it should. However, goodtables-py does the opposite: 2 or more null values in a field triggers a unique-constraint error.
roll
@roll
@ezwelty Thanks. It must be a bug in goodtables-py
Augusto Herrmann
@augusto-herrmann
Augusto Herrmann
@augusto-herrmann

It seems that the badge looks only at the latest commit, regardless of which branch it is.

I've added a new commit to the branch with data that now validates. So now the badge shows the data as valid in all branches.

Johannes Jander
@iSnow
Hi guys, I believe the JSON schema files for data-package.json and data-resource.json on the frictionlessdata site are wrong and contradict the specs in respect on how a schema for a data resource can be given. I opened a ticket (frictionlessdata/specs#645) and would ask you to check whether I am am right here or seeing things.
Johannes Jander
@iSnow
The same is true for dialect BTW.
roll
@roll
Thanks @iSnow. I've added some historical notes to this issue
Karik Isichei
@isichei
Hi here,
I am trying to use goodtable in python to validate a single row and struggling to get it working with the internals. A bit of background I am using the validation part of goodtables to identify where my data is malformed. I then run a set of functions against the errored row in an attempt to fix it. Ideally I would then like to do something like validate_row(row, schema) so that I can see if the fix worked or not. Atm it seems like the simplest solution is for me is to basically create a new descriptor that has the single row as the data but that seems inefficient to me? I just wanted to check that I hadn't missed anything or if there is an easier or more efficient way to validate a single row?
Thanks
roll
@roll
Hi @isichei, please try something like this goodtables.validate([['1', 'test']])
Although, it's not a really performant solution. I would rather use tableschema.Schema.cast_row - https://github.com/frictionlessdata/tableschema-py#schemacast_rowrow
Paul Girard
@paulgirard
Hi folks, I can't find any references to the group concept in the specification ? It's clearly deifned in the datpackage-py documentation for instance but can't find it on frictionlessdata.io ?
roll
@roll
Hi @paulgirard it's a software level extension at the moment
Paul Girard
@paulgirard
Ok but I might end up in spec at some point right ?
it
roll
@roll
I think more probably as a pattern but we haven't discussed it yet. Is it useful for you?
Paul Girard
@paulgirard
Well actually I am using it. I wanted to check specs and couldn't find it
I think it's useful for datapackage with large volume
Paul Girard
@paulgirard
Dear @roll I am suprised that the resource CSV columns has to be in the same order as the fields in the schema ? Since the documentation is based also on the field/column name I don't get why the order is important ?
Paul Girard
@paulgirard
Dear all, I am organising a thematic track at FOSDEM (Free Open Source Software Developer European Meeting) conference this year called "Open Research Tools and Technologies". Having the frictionless data tools presented there would be neat ! Feel free (and actually encouraged) to apply to our call for presentations. Deadline 1rst December 2019. Conf at Bruxelles on Sturday the 1st February 2020. Details here: https://research-fosdem.github.io/
Ping me if you have any questions.
roll
@roll
@paulgirard It's based on the order because it covers more use cases e.g. headless tables or tables with duplicate headers
That's great! cc @lwinfree
Lilly Winfree
@lwinfree
Thanks for the encouragement @paulgirard! We’ll definitely submit for a talk.
Paul Girard
@paulgirard

Thanks for the encouragement @paulgirard! We’ll definitely submit for a talk.

Great !

Mark Pinkerton
@mpinkerton-oasis
Hi. I am working on a proof of concept using Frictionless data for capturing/sharing complex data for risk modelling. I had a few general questions, and appreciative of any assistance:
  • Is it possible to reuse table schemas between data packages, perhaps by including them from an external file?
  • Is it possible to reuse tables across data packages? We have rich domain data that we need to validate the data.
  • Is it possible to define compound foreign key constraints?
Adrià Mercader
@amercader
hi @mpinkerton-oasis , @roll will confirm these, but my understanding is that the answer is yes to all:
  1. You can use a URL for the schema property of resources, pointing to a JSON Table Schema file hosted elsewhere
  2. Do you mean different data pacakages having resources that point to the same data file? If so, I don't see why not as the path property can also be a URL
  3. According to the spec (https://frictionlessdata.io/specs/table-schema/#foreign-keys) you can define a list of fields both in the source and the destination of the foreign key so that should cover compound FKs, unless I missunderstood
roll
@roll
@amercader :thumbsup:
Mark Pinkerton
@mpinkerton-oasis
Thanks @amercader, on my second point the use case is having one data package that has a collection of domain data that is then used for key validation in a range of other data packages. I'm not sure if there is an established pattern for this, but if the schema and data can both be set as paths that will certainly help.
roll
@roll

@mpinkerton-oasis
We have recently added an external FKs support for goodtables:

The second example shows how it uses the country-codes data package as a reference table.

Mark Pinkerton
@mpinkerton-oasis
That sounds ideal - I'll try it out.
Paul Girard
@paulgirard
Dear all, A quick question. Do you agree that for a set of resources gathered in one same group, the schema can be set only for the first resource of the group ?
Paul Girard
@paulgirard
well actually my current issue could also be solved by changing the datapackage-js behaviour. Currently the issue is that with a package which has a group of resources, if a schema is indicated for all the resources of the group (which are numerous in my case > 1000) then the lib will load the same schema as many times as the number or resources
which is not ideal... above all if the basepath is actually remote...
Thus I see two ways out :
  • a change my package by removing schema in my grouped resources but the first one
  • I update the datapackage-js to load only the first schema is resources are in a group
Any comments ?
Rufus Pollock
@rufuspollock
@paulgirard this is a great question - i think it will require a bit of thought - can you post this in our new trial forum for technical questions in github issues https://github.com/frictionlessdata/forum/issues
Paul Girard
@paulgirard
Yes I can
Note : I'll try the first approach as a quick workaround but I can do the dev for the second path if validated
Paul Girard
@paulgirard
@rufuspollock Here you are : frictionlessdata/forum#1
ah ah I just wrote the first issue ;-)
Rufus Pollock
@rufuspollock
:clap:
Paul Girard
@paulgirard
By the way @lwinfree, very happy to have seen your proposal in our devroom at the FOSDEM (see https://research-fosdem.github.io/). We finished reviewing this morning : you are in !
Glad to meet you and share frictionless data with FOSDEM audience in Brussels next year :)
Rufus Pollock
@rufuspollock
@paulgirard i’ll comment there too - one point reading your question is that you have a use case for chunks. There seem several solutions …
  1. Use chunks (but that is not yet properly supported in the libs)
  2. Cache the remote url so you don’t do a network call a 1000x
  3. … maybe others ...
Paul Girard
@paulgirard
Are chunks = multipart ? I though of using muliparted resources but the no-header requirement is a no go for my usecase.
yes caching would do it. We could add a caching mechanism into the datapackage-js which don't load twice the same schema file.
which could be maybe be done easily with https://github.com/jin5354/axios-cache-plugin
Rufus Pollock
@rufuspollock
@paulgirard you can have headers - the no-header point is not a requirement ...
Rufus Pollock
@rufuspollock
i’ve commented here frictionlessdata/forum#1
Paul Girard
@paulgirard
oh in that case multipart with headers is the best for me. It will make a mush lighter datapackage.