These are chat archives for kite-sdk/kite

6th
Oct 2014
Manthosh Kumar
@manthosh
Oct 06 2014 07:10
Is there a way to recover the avro .tmp files via Kite if the program fails before close() is called??
Joey Echeverria
@joey
Oct 06 2014 14:57
Right now that process is manual. All that you need to do is rename the file and it will have all of the records up-to the last flush() (if HDFS didn't go down) or sync().
Manthosh Kumar
@manthosh
Oct 06 2014 19:12
And what's the difference between flush() and sync()? I don't find any
Joey Echeverria
@joey
Oct 06 2014 22:06
flush() guarantees that the data is no longer in client buffers. In particular, it means that the data has been sent to at least 3 data nodes (assuming you have the default replication level set, if you change it it will be N data nodes)
sync() provides the same with the further guarantee that the data is no longer in OS buffers and is actually synced to stable storage
flush() means your client can now fail, and everything up to the last flush has made it to the data nodes
sync() means that your cluster can fail and the data is persisted so long as the underlying storage is readable
does that make sense?