These are chat archives for CommBank/maestro

18th
May 2015
Jacob Stanley
@jystic
May 18 2015 00:01
Does anyone know the status of CommBank/ebenezer#109 ?
Vineeth Varghese
@vineethvarghese
May 18 2015 00:04
@jystic @stephanh is still working on it. He is just held up with an issue running the tests. When he is back he could shed more light on it.
Jacob Stanley
@jystic
May 18 2015 00:09
@vineethvarghese is he away this week?
Stephan Hoermann
@stephanh
May 18 2015 00:11
@jystic The tests on travis seem to run out of memory.
Jacob Stanley
@jystic
May 18 2015 00:18
@stephanh cool, looks like you're actively working on it, that's all I wanted to know :) If there's anything I can do to help I'd be happy to.
Stephan Hoermann
@stephanh
May 18 2015 00:24
@jystic if you find a way to make sbt run tests across subprojects sequentially that would be great :smile:. The problem is that while we can tell sbt to run tests sequentially within a subproject it runs subprojects themselves in parallel. And for each test we set system properties for the whole of the JVM. Earlier versions of spec seem to have been more forgiving of that but the current version runs into concurrency issues. A nice way to address that is by just getting sbt to fork the jvm for each subproject.
However, if I try to run the tests on travis it crashes with an EOFException which I think is caused by it running out of memory.
Jacob Stanley
@jystic
May 18 2015 00:30
Hmm ok, Laurence may actually have solved that in the uber.scoring repo, I'll see if there's anything interesting in our settings
Jacob Stanley
@jystic
May 18 2015 00:45
Ok, I can't see anything that would suggest we can run subprojects sequentially

We do have this in all projects:

concurrentRestrictions in Global := Seq(
  Tags.limit(Tags.CPU, 2),
  Tags.limit(Tags.Network, 10),
  Tags.limit(Tags.Test, 1),
  Tags.limitAll( 15 )
)

But according to the docs it won't have any effect unless you tag tasks.

Our javaOptions are fairly crazy:
    javaOptions ++= Seq("-Xms2048M", "-Xmx8192M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
Laurence Rouesnel
@laurencer
May 18 2015 01:02
@jystic that will make the subprojects run sequentially (by virtue of it being Global and adding Test limitations means that only 1 test over everything can run at once)
Todd Owen
@toddmowen
May 18 2015 01:02
Does anyone use the latest build of maestro, or is the most recent build in production whatever etl.util points to (last changed Apr 10)?
Laurence Rouesnel
@laurencer
May 18 2015 01:02
This is a fairly good summary of how SBT enables parallelization and schedules different tasks: http://www.scala-sbt.org/0.13/docs/Parallel-Execution.html
Jacob Stanley
@jystic
May 18 2015 01:03
@laurencer cool, that's the docs I was reading
Laurence Rouesnel
@laurencer
May 18 2015 01:03
@stephanh see above comments - and what we do for uber.scoring
Jacob Stanley
@jystic
May 18 2015 01:03
I think perhaps the CPU and Network tags aren't doing anything in our case? Unless we do some tagging that I didn't see.
Jonathan Merritt
@lancelet
May 18 2015 01:07
@jystic the javaOptions are just to expand resources for SBT. I'm not 100% confident that CMS is the default GC for Java 7 anymore; I have heard rumours that JRE7 uses the G1 collector by default (can anyone confirm?). So, you might have to add -XX:+UseConcMarkSweepGC just to be sure CMS is enabled.
Jacob Stanley
@jystic
May 18 2015 01:09
@lancelet So I have that set in my sbt launcher, but perhaps it should be a project setting
Stephan Hoermann
@stephanh
May 18 2015 01:09
@laurencer @jystic I tried those settings and it seems to break the hive tests in ebenezer. Not sure why.
Jonathan Merritt
@lancelet
May 18 2015 01:11
@jystic I'm not sure where it has to be in order to take effect. If compilation is happening in the original Java process, then presumably the GC has to be set on the launcher script. However, if compilation is forked, then I guess it could be a project setting...?
@jystic we need a JVM guru who knows these things; and who can tell us about profiling remote Java processes properly and all that good stuff... :smile:
I tinker with JVisualVM occasionally, but I really wish I knew what I was doing.
Rowan Davies
@rowandavies
May 18 2015 01:45
@toddmowen Generally maestro client code depends on maestro via el.util. But but there’s at least one exception, and probably others: CommBank\eventually seems to depend directly - at least it has a maestroVersion set directly in its build.scala.
Todd Owen
@toddmowen
May 18 2015 01:46
@rowandavies thanks
Luke Williams
@shmookey
May 18 2015 01:47
just about to update etl.util :)
Todd Owen
@toddmowen
May 18 2015 01:48
@shmookey good timing!
Luke Williams
@shmookey
May 18 2015 01:48
ok, we're now at 2.10.0-20150512232913-778ce98
Jonathan Merritt
@lancelet
May 18 2015 01:51
@rowandavies yes, we currently depend on Maestro directly, mostly for loading things. We can be very adaptable though, so don't worry if there are any breaking changes.
Tin Pavlinic
@triggerNZ
May 18 2015 02:40
I noticed that when building maestro, it looks at the "commbank-releases-private" at "https://commbank.artifactoryonline.com/commbank/libs-releases-local"
it never actually pulls anything from there but it shouldn't even look
i was considering making a change in uniform: def uniformDependencySettings: Seq[Sett] = uniformPublicDependencySettings ++ uniformPrivateDependencySettings
and importing uniformPublicDependencySettings in maestro instead. any thoughts?
Andrew Cowie
@afcowie
May 18 2015 02:59
@triggerNZ that will address the rampant Unable to find credentials for [Artifactory Realm @ commbank.artifactoryonline.com] that we're getting, yeah?
Tin Pavlinic
@triggerNZ
May 18 2015 02:59
yes
Conrad Parker
@kfish
May 18 2015 03:02
@triggerNZ sounds sane
i'm sure @stephanh will say to please make a pull request for it :)
Stephan Hoermann
@stephanh
May 18 2015 03:33
@triggerNZ sounds good!
Conrad Parker
@kfish
May 18 2015 03:35
@jystic did you need a different subdir on bintray?
Jacob Stanley
@jystic
May 18 2015 03:36
I just need scrooge to work, so I think I need that same as you guys
Stephan Hoermann
@stephanh
May 18 2015 03:36
@jystic ebenezer is now done. 0.18.0-20150518023229-cda03fa.
Conrad Parker
@kfish
May 18 2015 03:37
Jacob Stanley
@jystic
May 18 2015 03:38
@stephan Magic! :sparkles: :clap:
Tin Pavlinic
@triggerNZ
May 18 2015 03:39
@stephanh roger that, i'll raise a PR
Andrew Cowie
@afcowie
May 18 2015 04:07
Is Java 8 a problem? I'm having problems, but before I raise them here I thought I should find out whether I did the wrong thing by installing whatever version of Java Oracle thought I should install. I notice the self-service thing has 7u75, but that's failing to install {charming}
Stephan Hoermann
@stephanh
May 18 2015 04:13
We use Java 7. There could be issues with trying to build against Java 8.
Luke Williams
@shmookey
May 18 2015 04:21
@stephanh does CommBank/maestro#309 belong on omnitool now? i figure i'll just add a unit test to it for that specific issue
oh, there already is one, and it works
Luke Williams
@shmookey
May 18 2015 04:28
nevermind, i don't think it actually does test for that
Stephan Hoermann
@stephanh
May 18 2015 06:12
68 deprecation warnings in maestro-macros.
Kristian Domagala
@dkristian
May 18 2015 06:18
I'm trying to get the maestro-example code running on a local CommBank/cdh5.3.0 vagrant vm
the accompanying scripts and test directory structure have fallen out of sync with the code, but I've managed to make progress by adding the various missing flags and restructuring the data
there's a known joda-time jar incompatibility on that image that's also been fixed
Vineeth Varghese
@vineethvarghese
May 18 2015 06:20
@dkristian feel free to submit a PR once you sorted it out :)
Todd Owen
@toddmowen
May 18 2015 06:20
@dkristian :+1:
Kristian Domagala
@dkristian
May 18 2015 06:20
I certainly will be
just struggling to get past a runitme exception now
underlying cause is org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
does this mean anything to anyone?
it's thrown from CustomerJob.scala:59 when trying to instantiate HiveMetaStoreClient
Vineeth Varghese
@vineethvarghese
May 18 2015 06:23
haven't seen that one. Can please you also create an issue against maestro so that we don't forget about this?
Kristian Domagala
@dkristian
May 18 2015 06:23
sure thing
Vineeth Varghese
@vineethvarghese
May 18 2015 06:23
Thanks @dkristian
Kristian Domagala
@dkristian
May 18 2015 06:24
I was just wondering if it's a problem specific to the cdh5.3.0 environment, or something more general
what local environments are normally used to run jobs?
Stephan Hoermann
@stephanh
May 18 2015 06:26
@dkristian have a look at https://github.com/CommBank/maestro/#hive
Gavin Whyte
@gavinwhyte
May 18 2015 06:26
Team City should be good to go.
Stephan Hoermann
@stephanh
May 18 2015 06:26
you will need to ensure that these properties are on the classpath by either adding them to the mapred-site.xml or hive-site.xml and adding the hive-site.xml to HADOOP_CLASSPATH.
Vineeth Varghese
@vineethvarghese
May 18 2015 06:28
@dkristian Hopefully @stephanh 's suggestion would sort things out for you
Kristian Domagala
@dkristian
May 18 2015 06:28
thanks, will give that a go
Stephan Hoermann
@stephanh
May 18 2015 06:31
@dkristian based on the settings you set in the properties Hive will dynamically try and load different classes, etc.
It's awesome :-1:
Jost Berthold
@jberthold
May 18 2015 06:32
@dkristian @vineethvarghese I had similar issues when running a hydro that uses ebenezer-hive
The command that succeeded in the end featured
HADOOP_CLASSPATH=/etc/hbase/conf:/etc/hive/conf:<the-jar-I-am-running> hadoop jar <the-jar-I-am-running> blabla...
without passing the jar in the class path, my app failed on another missing class (some scheme)
Kristian Domagala
@dkristian
May 18 2015 06:33
thanks for the tips!
Vineeth Varghese
@vineethvarghese
May 18 2015 06:47
Yeah HADOOP_CLASSPATH needs to be set correctly before running jobs. I assumed this was already set.
Luke Williams
@shmookey
May 18 2015 06:53
hey guys, i have a doctor's appointment early tomorrow morning. shouldn't make me late, just thought i'd mention it in case i end up stuck in the waiting room for 45 minutes again
Vineeth Varghese
@vineethvarghese
May 18 2015 07:16
Can someone please review a trival PR CommBank/etl-controller#22
Stephan Hoermann
@stephanh
May 18 2015 07:26
I got a working version of Maestro for Scala 2.11. There are still lots of deprecation warnings to fix. As a nice bonus dependency resolution caching now seems to work.
Stephan Hoermann
@stephanh
May 18 2015 07:50
And it built first go on Travis which is amazing :smile:
Sam Roberts
@SamRoberts
May 18 2015 11:18
the error reporting must be broken
Conrad Parker
@kfish
May 18 2015 20:32
@stephanh :thumbsup:
Jacob Stanley
@jystic
May 18 2015 23:30

Has anyone seen this sort of error before when using ThermometerHiveSpec:

[08:55:03][Step 2/6] [error]  No files found under </tmp/hadoop/test-186e22ec-2964-48de-aabd-f9cf7f1c277d/hive/warehouse/scoresTestTable/partition_model_id=modelId1/partition_score_date=20140805/*.parquet>. (PathFact.scala:37)

I have a test which works perfectly locally, but fails when run on a TeamCity build agent.

Conrad Parker
@kfish
May 18 2015 23:31
@jystic this is from a new test you added?
Jacob Stanley
@jystic
May 18 2015 23:40
@kfish yep, well, it's an old test that I've re-enabled after upgrading thermometer, but it works fine locally
Stephan Hoermann
@stephanh
May 18 2015 23:51
@jystic are your tests running in parallel?
Jacob Stanley
@jystic
May 18 2015 23:54
Yes, although this is the only hive/parquet test