These are chat archives for kite-sdk/kite

22nd
Oct 2014
Doyle Timberlake
@Timbercakes
Oct 22 2014 18:46
I realize this may be a novice question but I am new to the world of big data. Can someone explain to me if I create a new dataset to write to hive will the table be created or do I first need to manually create the table in hive. Also I plan to use the dataset in the future, so where does the dataset hangout until Datasets.load() is called? Thanks!
Joey Echeverria
@joey
Oct 22 2014 20:39
Hey @Timbercakes, the table will be created when the dataset is created. If you have authorization enabled, such as with Apache Sentry, then you need to make sure you create the dataset as a user that has create table privileges in Hive
For Hive-backed datasets, the metadata all lives in the Hive metastore
so when you call load, it will read the metadata from there and then give you access to the data in the dataset by creating readers/writers
For HDFS-backed datasets, the metadata lives in a .metadata directory in the root of the dataset directory
and it's loaded from there
Finally, HBase-backed datasets store their metadata in a special HBase table
in general, Kite is built to be metadata agnostic
we have a MetadataProvider interface
which then has different implementations depending on the type of dataset
you can always extend a dataset type (technically a DatasetRepository, part of the SPI) to override the MetadataProvider
so, Hive-backed datasets are really HDFS datasets with a different MetadataProvider
hopefully that makes some sense
Doyle Timberlake
@Timbercakes
Oct 22 2014 21:01
Awesome, Thanks! @joey