These are chat archives for CommBank/maestro

8th
Apr 2014
Quinton Anderson
@quintona
Apr 08 2014 22:13

Morning guys, feedback from Andre at concurrent:

things are a bit slower, than I expected, but I am getting there. Chris was travelling all of last week so we could only catch up yesterday to discuss some things that have to be done in Cascading itself, to make the partitioning work. (Hive does not discover partitions by itself, you have to register them as they appear and for that to work in a maintainable way, I had to open up an Cascading internal API). I am going to make the Cascading 2.6-wip builds public today and that will be the requirement for cascading-hive in order to get the partitioning to work. The partitioning support in cascading-hive will be pushed to github later today, so you can give it a try.

For the parquet discussion: We would prefer if you guys could keep the ebeneezer/scalding deps. in a project, which you maintain, so that we can limit our dependencies to a minimum. Please let me know, if that works for you.

I just pushed the partitioning support, along with some small changes left and right. There is a new demo app, which uses the partitioning feature. Please note that it only works, if you have a remote Hive MetaStore, since the different partitions are registered as they are created by the Cascading flow (cluster side)
Tomorrow I am going to focus on the views.
@stephanh , there is a pull request against Cascading-hive that hasn't merged yet which maestro is dependent on. It is important to merge this before proceeding further, or potentially reject the pull request. Can you please review against the latest branch and make a call?
In order to get the partitioning working, a change was needed to cascading core. There are a few issues that arise:
  • we will need to fork scalding to get the wip cascading version
  • this has a refactor implication through a project structures, which we need to discuss
  • finally, given the need for a remote meta store for job execution, the hive support just became extremely difficult to test ( short of integrating vagrant into specs2 ), I am open to suggestions on this front, but we can't leave the testing the way it is. This is a separate concern to the work that max has done around functional testing, probably.
@stephanh please let me know if you have any permissions issues getting through the cascading-hive work. Thanks.
Quinton Anderson
@quintona
Apr 08 2014 23:19
@stephanh, when you did the testing with @jamindaw , did hive read the partitions correctly?
Stephan Hoermann
@stephanh
Apr 08 2014 23:21
@quintona is the pull request at https://github.com/ConcurrentCore/cascading-hive? I couldn't find it on commbank. I don't have access to concurrentcore
Quinton Anderson
@quintona
Apr 08 2014 23:44
Whats the best approach here? i can take their branch and push to another branch in commbank? i will request access in the mean time, but it wont happen immediately
Quinton Anderson
@quintona
Apr 08 2014 23:54
@stephanh I have pushed to the following branch: https://github.com/CommBank/cascading-hive/tree/topic/paritioning
A diff between that and wip-1.0 will give you the contents of the pull request
@jamindaw did the partitions read correctly in your testing with stephan?