Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 07:16
    roll milestoned #709
  • 07:15
    roll assigned #709
  • Jan 14 09:06
    dependabot[bot] labeled #763
  • Jan 14 09:06
    dependabot[bot] opened #763
  • Jan 14 09:06

    dependabot[bot] on npm_and_yarn

    Bump follow-redirects from 1.13… (compare)

  • Jan 13 09:56

    roll on main

    Fixed project workflow (compare)

  • Jan 11 14:05
    roll labeled #545
  • Jan 11 14:05
    roll unlabeled #545
  • Jan 11 14:05
    roll unlabeled #545
  • Jan 11 14:05
    roll unlabeled #545
  • Jan 11 14:05
    roll labeled #551
  • Jan 11 14:05
    roll unlabeled #551
  • Jan 11 14:05
    roll unlabeled #551
  • Jan 11 14:03
    roll unlabeled #573
  • Jan 11 14:03
    roll labeled #573
  • Jan 11 14:03
    roll unlabeled #573
  • Jan 11 14:00
    roll labeled #557
  • Jan 11 14:00
    roll unlabeled #557
  • Jan 11 14:00
    roll unlabeled #557
  • Jan 11 13:59
    roll unlabeled #561
roll
@roll
On other hand the eco-system is very Data Package centric (e.g. datahub.io stores packages, not resources) so even having the Data Resource specification to use package property feels kinda natural.
Stephen Gates
@Stephen-Gates
My real world example, every year every government department must publish the same data tables in their annual reports. These are also published as open data. An Excel spreadsheet is sent around to all departments to enter the data. The spreadsheet has a validation rules to control data entered (think enum constraint or FK lookup). With Frictionless Data, a template Data Package with a Table Schema is circulated to aid in collecting data. This references a table on CKAN that contains the FK lookup values. The data is validated by each department and the data is consistent for every department. The following year there may be a change to the FK lookup table - I assume this change would result in a new URL for the dataset in CKAN (I could be wrong here). The process is repeated to collect and validate the data. Hope that helps.
Kenji
@Kenji-K
Just poppin in to say that I love what you guys are doing
Kenji
@Kenji-K
Well maybe I should also use this opportunity to ask a question, given that I am just recently getting my feet wet with respect to this subject. What is the relationship between the FD specs and Dublin Core? I have an idea in my head but I'd rather not muddy the waters with my misconceptions and will refrain from putting it here.
Stephen Gates
@Stephen-Gates

Hi, I'm updating the list of open licenses and its data package in https://github.com/okfn/licenses.

I'm validating my changes with Goodtables.io at http://goodtables.io/github/Stephen-Gates/licenses

In my okfn/licenses#57, I accidentally added an error by mis-spelling "superceded". Goodtables.io didn't fail the data despite my enum constraint.

I'm wondering if there's an error in my table schema or if GoodTables has a bug?

roll
@roll
@Stephen-Gates Thanks. I'm looking into it
roll
@roll
@Stephen-Gates It should work now
Stephen Gates
@Stephen-Gates
Thanks @roll just pushed a change to the licenses.csv with one error remained ("superseded" L71) but still passed GoodTables test.
Martín n
@martinszy
Hello, I'm testing datapackages-pipeline
Rufus Pollock
@rufuspollock
@martinszy hey there, that's great!
Martín n
@martinszy
I have a couple of questions:
1) I have changing filenames, is there any way to use wildcards in add_resource?
2) Is is a bad idea to have a dump.to_ckan action?
3) Does ckan handle datapackages yet?
@amercader and @brew can answer any questions you have about the CKAN integrations next week
Martín n
@martinszy
thanks, I'll check that out
Martín n
@martinszy
For 1) I'm thinking: either create a pre-process that generates the YAML file or modify datapackages-pipeline in order to allow for custom_resource_adders or something like that, that listen for multiple files and then trigger the rest of the process... but I've not analyzed this properly yet
roll
@roll
@Stephen-Gates Could you please try again?
Stephen Gates
@Stephen-Gates
@roll working perfectly (see Job History). The PR okfn/licenses#57 is now good to go. Thanks so much for your help.
Brook Elgie
@brew
@martinszy Don't forget you can generate pipelines dynamically using a "Generator": https://github.com/frictionlessdata/datapackage-pipelines/#plugins-and-source-descriptors Perhaps this would help with the flexibility you're after?
Martín n
@martinszy
@brew awesome! I'll check it out!!
Oleg Lavrovsky
@loleg
Good morning & greetings from a park bench (47.37226921, 8.54668868) next to the office of data.stadt-zurich.ch, where I'm starting my new project with you today.
I'm starting with a review of the Python implementation, which I started already testing last week in combination with a tool we use to work with data providers & teams at hackathons. Thanks @callmealien @pwalsh @jobarratt for your help getting plugged in so far! I'll be keeping an open dev log, and you can mention me here anytime if you have questions or suggestions.
Vitor Baptista
@vitorbaptista
@loleg Good luck with the new project! Do you plan on publishing your open dev log somewhere?
Oleg Lavrovsky
@loleg
@vitorbaptista absolutely, it'll be on GitHub at least in raw form later today
Vitor Baptista
@vitorbaptista
@loleg Cool! Please let me know when it's online
Oleg Lavrovsky
@loleg
@vitorbaptista thanks for your enthusiasm :) https://github.com/loleg/devlog/tree/master/content
Vitor Baptista
@vitorbaptista
:tada: :smile:
Oleg Lavrovsky
@loleg
Are there notes anywhere of the roots of the standard, specifically how much it owes to (and potentially influences the future of) https://github.com/ckan/ckan/blob/master/ckan/logic/schema.py#L206 ?
Oleg Lavrovsky
@loleg
(or is this a touchy topic I should leave to later discussion..)
Stephen Gates
@Stephen-Gates

Hi, I'm looking for test datapackage.zip files. I came across https://github.com/frictionlessdata/testsuite-extended and https://github.com/frictionlessdata/example-data-packages but these didn't help. I couldn't find datapackage.zip files on datahub.io either.

Any suggestions on a source for data package test data and if not, where is the best place to contribute these?

Rufus Pollock
@rufuspollock

Are there notes anywhere of the roots of the standard, specifically how much it owes to (and potentially influences the future of) https://github.com/ckan/ckan/blob/master/ckan/logic/schema.py#L206 ?

Not a touchy topic at all. If you go back to the pre-history of frictionless data in 2007 or so then yes: ckan metadata was partially inspired by python packaging and so was first "dpm" (data package manager). In fact, as you may know, CKAN was originally intended to act like pypi or cpan cran etc.

Over time, the source of inspiration of data packages has shifted a bit towards more recent packaging systems like node + package.json (pypi was probably not the best initial inspiration).

This is something that would probably be worth a discuss.okfn.org question so we can write up there for posterity :-)

Oleg Lavrovsky
@loleg
Got it, will do, thanks Rufus!
Meiran Zhiyenbayev
@Mikanebu

Core Data: Essential Datasets for Data Wranglers and Data Scientists

This post introduces you to the Core Data, presents a couple of examples and shows you how you can access and use core data easily from your own tools and systems including R, Python, Pandas and more.
http://datahub.io/blog/core-data-essential-datasets-for-data-wranglers-and-data-scientists

This is a blog post. To read full text, please, follow the link above.

Jeremy Palmer
@palmerj
Hi All!
I was wondering what it would take to be involved in the next development of the specifications. In particular for the Tabular Data Package
For the Data Service that we manage we have been looking for a standard way of better describing CSV metadata and the package schema defined here looks great.
The main issue is added full spatial data type support
Jeremy Palmer
@palmerj
At the moment points and GeoJSON support is there in a basic format, but issues like spatial extent of dataset, spatial reference system and vector geometry type (e.g polygon, point, linestring) need to be added to the schema to make the schema work properly in the geospatial world
Thanks!
Stephen Gates
@Stephen-Gates
Hi @palmerj, It's great that your keen to contribute to the spatial aspects of Data Packages. I started contributing by commenting or raising issues on github, discussions on the forum or here. I wrote this guide to try progress spatial data in packages. You may be interested in frictionlessdata/specs#86
Stephen Gates
@Stephen-Gates
Given @palmerj's question about contributing, I notice that many frictionless data repositories don't have the recommended community files. In a the yet to be accepted PR okfn/licenses#57 I added a code of conduct, contributing, and other community files. Is it appropriate to add these to the repo or are there standard templates to apply to OK repositories?
Rufus Pollock
@rufuspollock

Hi All!

Welcome!

I was wondering what it would take to be involved in the next development of the specifications. In particular for the Tabular Data Package

You've taken the first step! We welcome contributions and new curators of the specifications.

At the moment points and GeoJSON support is there in a basic format, but issues like spatial extent of dataset, spatial reference system and vector geometry type (e.g polygon, point, linestring) need to be added to the schema to make the schema work properly in the geospatial world

We'd really welcome your help here - be it on improving geo in the tabular spec or on the separate WIP geo spec.

Rufus - co-lead curator of the Frictionless Data Specs

@Stephen-Gates first a huge appreciation of your ongoing contributions here -- you are a definitely a candiate for curator :-)

Given @palmerj's question about contributing, I notice that many frictionless data repositories don't have the recommended community files. In a the yet to be accepted PR okfn/licenses#57 I added a code of conduct, contributing, and other community files. Is it appropriate to add these to the repo or are there standard templates to apply to OK repositories?

@Stephen-Gates i'd say generally yes. For all of OKi stuff i'd recommending first suggesting on okfn/chat or the forum. For frictionless if you could do a draft (or pull out your exiting one into a PR and we can review).

Mamadou Diagne
@genova
@Mikanebu @rufuspollock we have an organisation with some data packages, I would like to publish data with the name of the organisation not my username like this https://datahub.io/organisation_name/dataset thank you
Rufus Pollock
@rufuspollock
@genova can you re-ask this in gitter.im/datahubio/chat - as that is the primary chat channel for datahub questions :-)
Meiran Zhiyenbayev
@Mikanebu
@genova Great! At the moment organization accounts are manually provided. Lets have a chat or schedule a short call so I can assist you on creating account.
Oleg Lavrovsky
@loleg
Latest devlog posted from the Julia project, comments welcome https://github.com/loleg/devlog/blob/master/content/2017-11-06-Community.md
Stephen Gates
@Stephen-Gates
Matthew Thompson
@cblop
Hi all, I'm reasonably new to this remote style of development so please bear with all of my questions!
I'm working on the Clojure implementations of tableschema and datapackage. I'm starting off with the code for the type casting in tableschema-clj (doing it with clojure.spec as I go). Should I be pushing to the repo every time I get a bunch of tests passing, or get most of it working offline first and then push it all in one huge update?
If I push the code bit-by-bit, then obviously you can look at it, but nobody will be able to actually use the code for a while