These are chat archives for CommBank/maestro

28th
Sep 2014
Quinton Anderson
@quintona
Sep 28 2014 22:00
@stephanh Here is the response I have received from Cloudera on the version compatibility discussion:
Quinton,

Also, one other clarification if you are asking about future parquet format changes:  The move from version 1 to version 2 will be a backward-compatible change. So all data written with the old version will be readable with the new version. The new format version is currently used by the 1.x version of parquet-mr, but it is not the default. The upstream plan is to move to version 2 of the format as the default by moving from major version 1 to major version 2 of parquet-mr.

To summarize:
* parquet-format 2.x can read parquet-format 1.x
* parquet-mr 1.x uses parquet-format 2.x, but does not write 2.x files
* parquet-mr 2.x will write 2.x files by default

No migration is needed, but we do need you guys to be aware that moving from parquet-mr 1.x to 2.x will produce files that are not readable by some prior versions. T hat means that we don't recommend updating to 2.x and then, reverting back to parquet-mr 1.x.    

Please let us know if this answers your questions.

Thanks,
Robert Justice
Thu Sep 25 13:06:19 GMT 2014    Created by: Robert Justice
Quinton,

Did the explanation below from the developers answer your question?

Thanks,
Robert Justice
Wed Sep 24 15:47:01 GMT 2014    Created by: Robert Justice
Quinton,

The answer back from the Parquet developers are as follows:

"No migration is necessary. There are two things here: the version of the Parquet format and the version of the Parquet Java libraries (the parquet-mr project). We support Parquet format version 1, which is the format written by the Parquet Java libraries we ship (used by Hive etc), and Impala.

Different versions of CDH ship different versions of Parquet Java libraries. In CDH5.1 we ship parquet-mr 1.2.5, but this actually has bugfix patches from releases up to parquet-mr 1.5.0 applied to it, so it's not the same as upstream 1.2.5."

If there is some feature or fix you are interested in upstream Parquet, please let us know and I can create a request to show your need for this to our development and project management teams.

Thanks,
Robert Justice
We need to discuss. Our compatibility testing is going to have to be far more complete
Sam Roberts
@SamRoberts
Sep 28 2014 23:32
Do cloudera have a timeline for moving back to a mainline version of parquet?
Stuart Horsman
@StuHorsman
Sep 28 2014 23:36
@SamRoberts they'll always package parquet with their own version like that. It's the way they branch for version freezes. They do this months before the actual GA release.
Sam Roberts
@SamRoberts
Sep 28 2014 23:39
@StuHorsman sure, but what's the timeline from branching off from a higher version of parquet
Stuart Horsman
@StuHorsman
Sep 28 2014 23:41
So they call that a "rebase". So for a GA release of CDH6 which will be sometime April 2015, they'll rebase on a new version of Parquet Nov/Dec2014
Sam Roberts
@SamRoberts
Sep 28 2014 23:42
ok
Stuart Horsman
@StuHorsman
Sep 28 2014 23:42
currently 1.5 for parquet-mr. Make sense?
Stephan Hoermann
@stephanh
Sep 28 2014 23:43
I'm on the 7th floor today
Sam Roberts
@SamRoberts
Sep 28 2014 23:44
a continually rebased patch-branch workflow is entirely sensible
I'd just like some idea of what to expect (which they may or may not have provided, but I personally had no idea what the future plans were)
Stuart Horsman
@StuHorsman
Sep 28 2014 23:46
Its a good bet that in November/December they'll rebase on whatever the latest version is, unless there's some compelling reason not to.
Sam Roberts
@SamRoberts
Sep 28 2014 23:47
this is a regular yearly occurence?
Vineeth Varghese
@vineethvarghese
Sep 28 2014 23:49
@stephanh Do we have the standup today?
Stephan Hoermann
@stephanh
Sep 28 2014 23:49
No, instead we have the sprint review and planning session at 1:30pm