These are chat archives for frictionlessdata/chat
Tidy Data by Hadley Wickham http://vita.had.co.nz/papers/tidy-data.pdf +link
A huge amount of effort is spent cleaning data to get it ready for analysis, but there
has been little research on how to make data cleaning as easy and effective as possible.
This paper tackles a small, but important, component of data cleaning: data tidying.
Tidy datasets are easy to manipulate, model and visualise, and have a specific structure:
each variable is a column, each observation is a row, and each type of observational unit
is a table. This framework makes it easy to tidy messy datasets because only a small
set of tools are needed to deal with a wide range of un-tidy datasets. This structure
also makes it easier to develop tidy tools for data analysis, tools that both input and
output tidy datasets. The advantages of a consistent data structure and matching tools
are demonstrated with a case study free from mundane data manipulation chores.
I mean, the specs go far beyond the paper, which seems to solve very basic problem that specs expect to be solved already. I’m assuming it from what the author writes:
The principles of tidy data are closely tied to those of relational databases and Codd’s rela- tional algebra
then he mentions SQL and touches some other approaches or tools (all of which are just frameworks or methods). However, the rest of the paper is rather about putting free-form data into a table or aggregated pivot tables than about actual data cleaning (as is understood in the field).
hi. im interested in applying for the frictionless data tool fund for a c++ implementation as that is generally my language of choice. i have few questions regarding that: