Branko-Dj on master
[travis][s]: Added update comma… (compare)
Hi @zaneselvans thanks for asking great questions :+1:
Generally, you don’t need to change the raw data but provide all these information in the metadata (
datapackage.json file). If you’re using our
data CLI tool, it should guess things like encoding, delimiters and date formats and reflect it in the generated descriptor file. I would suggest reading this blog post re initializing data packages - https://datahub.io/blog/how-to-initialize-a-data-package-using-data-tool and I’d use interactive mode to control the process.
datapackage.jsonfile to see how the data files are described
|delimited data on datahub - https://datahub.io/anuveyatsu/pipe-delimited
datacli tool at the end of it (from python if you want)
@vaibhavgeek can you give a bit more detail on the issue with file rename.
To upload a file: just follow the instructions here https://datahub.io/docs/getting-started/publishing-data
Hi there. Just been testing out Google's new Dataset Search and found some spam datasets uploaded to the old datahub.io around 2013.
Where could/should I raise an issue to look at removing spam? Thanks
See screenshot above and visit the page:
Do folks have a favorite easy to use package for visualizing and filtering data that's accessible via data packages? Something that a relative layperson could use?
The perfect thing would be something that already ingests tabular but is made Data Package aware. Right now you can fallback to anything that can ingest csv (which is pretty much all tools). I can suggest some tools for playing with data that would suit (and we could think about how to plugin Data Package support as we have with e.g. pandas etc.
Is there a recommended maximum file size for use with tabular data resources? When running
No there is no limit for tabular data packages. This is a bug with data validate - can you open an issue on https://github.com/datahq/data-cli
I think you can use either route and for bigger packages goodtables may be better (and is used internally).
My other question here is whether any of the files can be chunked/partitioned - frictionlessdata/specs#620
I wanted to updated our datasets on datahub.io/johnsnowlabs
When pushing the dataset this is what I got:
> Error! Max storage for user exceeded plan limit (5000MB)
However the total size of the data that has been uploaded is ~200MB
At the moment I am scraping the list of pages, but it would be great to somehow get an exhaustive list of what has been uploaded. My use case is that we have a list of 217 datasets that we want uploaded. However, only 197 were uploaded. How do we identify the ones there weren't processed?
I went through the data utility logs which seemed to have uploaded everything.