As it probably is clear from the name of our organization, our general research revolves around the equation model, F(X)=Y, and organizing resources (such as datasets) to help humanity collectively apply that model to coordinate itself by solving the equation together.
For every research question in every field, scientists try to put their datasets into the form of equation to estimate relations and predictions to find optimal parameters for desired outcomes. However, to do these statistical analyses, regressions, extrapolations, interpolations, deep modeling, etc., they need clean datasets, and to act on them, they need reliable APIs.
At our research, we develop generic tools to help us achieve goal alignment through alignment of datasets and API systems. This work is generally related to the efforts at https://datahub.io , however, making datasets reusable like packages, yet we extend on format normalizations and systems driver (i.e., API-level interactions) normalizations, as well as contribute actual work on specific data resources of interest for aligment of global goals. Check out my current worksheet here, where I'm trying to list a few datasets that cold help answer the "WeFindX Questions" in context of a my country and the owrld. Feel free to copy it over, and share your versions of the datasets of interest.
Right now, the main piece of actually directly applicable work for normalizing datasets, is at https://github.com/wefindx/metaform . The examples are included. Feel free to try. I think, it would be fun to try to use it in combination with datahub, and provide immediately usable datasets for competitions at Kaggle, Kesci, Signate, Chahub, Grand-Challenges, DataFountain, DCJingSai and others, because of the diversity of datasets for testing our general alignment and normalization paradigms, as well as our frameworks for writing drivers of resources.
It may also make sense to meet with datahub.io creators, once our proposed meta-formats standards settle, for a convenient metaformat for data schema specifications.
package-like datasets-- one is to package them (like datahub.io), another is to provide versioned schemas and drivers for the original downloadable datasets and API resources. For example, suppose you need to run some trading of loans -- a driver with normalizer may solve both the data versioning problem, and the control problem. Providing drivers with normalizers thus is an approach that can turn any web resource into a "datahub". E.g., "YouTube" has got videos, and it could be a datahub for videos, with versioning and all, if a versioned schema normalizer were to be added to the YouTube API driver -- making arbitrary web resources
Given the tools we have right now (i.e., normalization (metaform), systems driving (metadrive), equation-model-categories, and convenient tools like Python, GitHub and Dynalist), I think we can start being zero-focused on our mission applying them for goal alignment of society at all levels. That involves understanding the connections between legislation activity, academic and patenting activity, and enterpreneurship activity, by building open map of these levels of society, which we label in that order as categories of - "goals" (governance), "ideas" (academia), and "plans" (business).
Right now, I'm starting to work on a several datasets about a picture in these domains about several countries. However, it would be very interesting to work on that more broadly, cover this about other countries, and connect.