Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 11:12
    roll edited #639
  • 11:10
    roll labeled #639
  • 11:10
    roll opened #639
  • Aug 08 07:26
    rufuspollock opened #369
  • Aug 07 15:45

    rufuspollock on master

    Add R implementation to list of… Merge pull request #638 from nu… (compare)

  • Aug 07 15:45
    rufuspollock closed #638
  • Aug 07 14:37
    nuest opened #638
  • Aug 01 08:25

    rufuspollock on master

    Fix improperly named document d… Merge pull request #637 from mi… (compare)

  • Aug 01 08:25
    rufuspollock closed #637
  • Jul 30 16:42
    micimize opened #637
  • Jul 24 12:18
    johanricher edited #368
  • Jul 24 12:04
    johanricher edited #368
  • Jul 24 11:19
    johanricher opened #368
  • Jul 23 16:20

    lwinfree on fix-json-snippet-language-names

    (compare)

  • Jul 23 16:20

    lwinfree on master

    Fix language name on JSON code … Merge pull request #636 from fr… (compare)

  • Jul 23 16:20
    lwinfree closed #636
  • Jul 23 09:13
    amercader opened #636
  • Jul 23 09:12

    amercader on fix-json-snippet-language-names

    Fix language name on JSON code … (compare)

  • Jul 12 15:50

    rufuspollock on master

    add Michael Rosenthal as author… Merge pull request #635 from mi… (compare)

  • Jul 12 15:50
    rufuspollock closed #635
Zane Selvans
@zaneselvans
I'm getting some weird dependencies that are preventing my conda environment from using Python 3.7, coming from tableschema and then datapackage It tells me datapackage -> cchardet[version='>=1.0,<2.0'] -> python=3.6 but in the setup.py for datapackage I can see that the current requirement is only >=1.0 and that that was updated some months ago.
So I don't know where the restrictive maximum version is coming from.
Zane Selvans
@zaneselvans
I created frictionlessdata/datapackage-py#237 in the datapackage-py repo. There's a mismatch between the requirements that show up in conda-forge and those enumerated in setup.py for v1.6.0 Not sure how that happened.
Zachary Trautt
@ztrautt
Hi! Is a tabular data package with both path and data valid and acceptable to tools where the data and CSV file are numerically equivalent?
Zachary Trautt
@ztrautt
Using a JSON validator I found the answer was no, it fails validation.
Rufus Pollock
@rufuspollock

Hi! Is a tabular data package with both path and data valid and acceptable to tools where the data and CSV file are numerically equivalent?

generally not, you need one. of them. BTW there is a convention that if you want to "cache" the value of a path file onto the Data Resource (a common practice e.g. for visualization) you store in _values attribute see e.g. https://frictionlessdata.io/specs/views/

Oleg Lavrovsky
@loleg
@micimize @nabellaleen re: (GraphQL or other) API based on a TableSchema I wanted to ask if you've manage to make any headway on this idea, and call your attention to https://github.com/dataletsch/panoptikum/blob/master/app.py which is my (quick & still relatively dirty) FlaskAPI / Pandas-DataPackage based attempt at a drop-in API for a TableSchema with search, sort, filtering. I'm also keen to just know if there's anyone working on a general solution to this?
Michael Joseph Rosenthal
@micimize
@loleg It's not something I'm working on
Oleg Lavrovsky
@loleg
Thanks for the heads up. I made another little prototype using the Falcon framework recently https://github.com/loleg/baumkataster-data/tree/master/api
I've been doing some thinking about the parallels between a Data Package-oriented solution design compared to OpenAPI-driven development, and it seems to me to be a valid use case. If there's any interest in having a lightweight standalone tool, let me know.
Victor Shih
@vshih

This - https://github.com/frictionlessdata/tableschema-pandas-py - suggests I should be able to pull an existing pandas dataframe into a data package:

>>> datapackage.pull_datapackage('/tmp/datapackage.json', 'country_list', 'pandas', tables={
...     'data': storage['data___data'],
... })

But apparently pull_datapackage is deprecated. Can someone suggest what the updated version of this should be?

roll
@roll
@vshih Hi, please take a look at - https://github.com/frictionlessdata/testsuite-extended/blob/master/tests/test_datapackage.py#L59-L73. It was integrated into the package model. So you can initiate (pull) a package using one of the storages SQL/Pandas/etc (package = Package(storage=storage)) and save it (push) using the saving method (package.save(storage=storage))
Unfortunately the documentation regarding "drivers" is not properly consolidated yet
Van Woods
@vdubya

@roll

502 error on https://goodtables.io ? Still happens to me >50% of the time and seems related to the cloudflare setup.

I've been gone for a while, was wanting to get back into goodtables.io. Unfortunately the Cloudflare 502 error is still happening.

image.png
Rufus Pollock
@rufuspollock
@vdubya works for me atm ...
JD Bothma
@jbothma
Hi folks - it looks like goodtables is having some trouble https://imgur.com/FyDJkcQ.png
I'm getting the same error in OpenSpending as a result of this I think https://imgur.com/6a5Gzwr.png
jobarratt
@jobarratt
thanks for raising this @jbothma we're on it
JD Bothma
@jbothma
Thanks for the update
Adrià Mercader
@amercader
@jbothma would you mind trying again? I think it should work now
JD Bothma
@jbothma
on it
it worked just fine from OpenSpending packager - thanks @amercader !
Michael Joseph Rosenthal
@micimize

A module for mapping to/from data packages to sqlalchemy would be super useful.

Could be used for getting to graphql also (@loleg): https://github.com/graphql-python/graphene-sqlalchemy

Some prior work in this gist: https://gist.github.com/LuizArmesto/025aeba8f5c6d6f058ee

roll
@roll
Hi, have you tried the tableschema-sql driver? It's not fully documented yet but it can be used with datapackage the same way as with tableschema - https://github.com/frictionlessdata/tableschema-py/blob/master/examples/table_sql.py
For example, mapping from database:
from sqlalchemy import create_engine
from datapackage import Package, Resource

DATABASE_URL = 'postgresql://roll:roll@localhost:5432/movies'

# Option 1
descriptor = {'resources': [{'path': 'countries'}, {'path': 'locations'}]}
package = Package(descriptor, storage='sql', engine=create_engine(DATABASE_URL))
package.infer()
package.save('descriptor.json')
Other question if SA models are needed to be used/populated instead of reading/writing to an actual database
Michael Joseph Rosenthal
@micimize
@roll ah right, I remember seeing that earlier! Yeah I'll definitely look at it more

In the data resource spec, there are two inline data examples:

{
  "data": {
    "resource-name-data": [
      {"a": 1, "b": 2}
    ]
  },
}

and

{
   "resources": [
     {
        "format": "json",
        # some json data e.g.
        "data": [
           { "a": 1, "b": 2 }
        ]
     }
   ]
}

Are they both correct? is the idea of the nested resource-name-data option to be analogous to "path": [ "myfile1.csv", "myfile2.csv" ]?

roll
@roll
@micimize TBH I'm not sure if this one is correct (cc @pwalsh):
{
  "data": {
    "resource-name-data": [
      {"a": 1, "b": 2}
    ]
  },
}
OK. Sorry I got it. It's an example of some abstract data resource. For tabular data resource the data field must be an array - https://frictionlessdata.io/specs/tabular-data-resource/
Michael Joseph Rosenthal
@micimize
ah, ok
Oleg Lavrovsky
@loleg
@micimize cool, thanks for sharing the gist! Let's brainstorm the scope for a joint project ..
sunedin
@sunedin
Hi beautiful people. I am still in the journey of learning datapackage. Got a bit confused today. Could anyone confirm that: when read/validate data from schema, it match exactly the order of fields in schema, rather than the name/lable of the fields?
roll
@roll
Hi, yes the order is the only one source for mapping
sunedin
@sunedin
@roll thanks for the quick reply. Wonder what’s the reason behind this design. My understanding is that Jason schema doesn’t enforce the order of the properties. Or is there way to ignore the order and mapping names?
roll
@roll
@sunedin The reasoning behind it is that in the most general case tabular data can have duplicate headers (consider some Excel spreadsheets created by non-tech users) or can have no headers at all (so only way to map it is using the order). Though, validity of this point was discussed many times. There is an idea that rebasing on a mapping instead of an ordered list will make things much easier with the limited downside. But, for now, it's only a discussion not even a proposal (because it will require huge changes everywhere, I mean implementations-wise)
sunedin
@sunedin
@roll Thanks, great explanation. A quick one, any existing implementation/suggestion on the property of ‘ unit’ ( such as specifying kg, stone etc)?
roll
@roll
@sunedin You mean related to something like this - frictionlessdata/specs#607 Not sure about implementations though as it's still being discussed
sunedin
@sunedin
@roll seems pretty much like what I have being thinking about. Will dig in to details and probably see what I can do to move it forward if it still have wider interest
Michael Joseph Rosenthal
@micimize

@loleg

Let's brainstorm the scope for a joint project ..

@loleg I'm currently playing around with tableschema-backed attrs class generation and a corresponding mypy plugin. The plugin is difficult. I'm interested in what it would take to make a strongly typed, performant data pipeline, possibly with apache beam compatibility

Stefano
@sabas
hello
just to confirm, now the datasets require the header row in the data?
Michael Joseph Rosenthal
@micimize
roll
@roll
Hi @sabas you mean for a tabular resource?
I guess you still can provide a file without headers if you set dialect.header=false - http://frictionlessdata.io/specs/csv-dialect/
Oleg Lavrovsky
@loleg
@micimize :+1: will do