These are chat archives for CommBank/maestro

23rd
Sep 2014
Stephan Hoermann
@stephanh
Sep 23 2014 00:55
When someone has a chance can you please review CommBank/ebenezer#53 it just adds a test for reading and writing large thrift structs with parquet.
Sam Roberts
@SamRoberts
Sep 23 2014 01:52
@stephanh done
Stephan Hoermann
@stephanh
Sep 23 2014 04:12
@vineethvarghese can you please review CommBank/ebenezer#55.
Sam Roberts
@SamRoberts
Sep 23 2014 04:30

Cloudera don't want to put the patched parquet jars on their public repo, citing the number of patched jars that they give to different customers, and that it would be too difficult for them to publish and support alll of these jars. Pretty much what I expected, and I think it is understandable.

I have asked them to take all the jars that have been modified and attach them to the case I made. I will then upload those jars to our own repository.

Stephan Hoermann
@stephanh
Sep 23 2014 04:31
Ok
Vineeth Varghese
@vineethvarghese
Sep 23 2014 04:31
ok
Stephan Hoermann
@stephanh
Sep 23 2014 05:27
@vineethvarghese can I merge the code?
Vineeth Varghese
@vineethvarghese
Sep 23 2014 05:29
yeah..have commented there
Antonios Chalkiopoulos
@Antwnis
Sep 23 2014 15:20
^ Sam will than not make it hard for those outside CommBank to get access to the 'corrected' JARs ? (How about using conjars and add a custom -maestro name?
Sam Roberts
@SamRoberts
Sep 23 2014 21:54
@Antwnis we have a publicly accessible repo we can upload things to: https://commbank.artifactoryonline.com/commbank/webapp/home.html
Sam Roberts
@SamRoberts
Sep 23 2014 23:56

does anyone know if it's an issue if we create a parquet table with ROW FORMAT DELIMITED and specifying parquet INPUTFORMAT and OUTPUTFORMATs?

The docs seem to state you should not be using ROW FORMAT DELIMITED with parquet input and output formats in hive 0.10, but it's not at all clear how important this is:

Stephan Hoermann
@stephanh
Sep 23 2014 23:57
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
Sam Roberts
@SamRoberts
Sep 23 2014 23:58
I ask because the hive table I am looking at right now on the cluster is using ROW FORMAT DELIMITED with parquet input and output formats. But I have no idea how widespread that practice is.
Stephan Hoermann
@stephanh
Sep 23 2014 23:58
Is Hive able to read from that table?