These are chat archives for broadinstitute/hail

3rd
Aug 2016
Tim Poterba
@tpoterba
Aug 03 2016 14:12
@cseed I’d like to chat for 15 minutes about the partitioner key implicit conversions if you have a chance today
cseed
@cseed
Aug 03 2016 14:29
Sure thing, after lunch?
Tim Poterba
@tpoterba
Aug 03 2016 14:29
works for me
cseed
@cseed
Aug 03 2016 14:34
Also, once the two PRs I made last night get merged in (grm, annotatevariants intervals —all), can you build a new jar? Kyle and Maryam are both eagerly awaiting.
Tim Poterba
@tpoterba
Aug 03 2016 14:34
sure
cseed
@cseed
Aug 03 2016 14:47
Dan, I added an issue for the higher kinded types change: broadinstitute/hail#537
cseed
@cseed
Aug 03 2016 14:57
@lfrancioli Looking at lf_head. I totally misunderstood on first glance. You’re right, I really like it. Quick Q: what’s nvp stand for? Number (v?) per partition?
Laurent Francioli
@lfrancioli
Aug 03 2016 15:09
I think it was number of variants per partition
and glad you like it! :)
if you have a better suggestion than nvp, go for it
cseed
@cseed
Aug 03 2016 15:12
OK, makes sense! I’m making some comments on the github. I’d make a suggestion there. Variant isn’t probably the best in this context, since it should operate on general RDDs that might contain who-knows-what.
Laurent Francioli
@lfrancioli
Aug 03 2016 15:14
right
good point
cseed
@cseed
Aug 03 2016 15:20
OK, I made some comments on github. Mostly anal things concerning style and process. If you address those, it should be ready to go! Thanks for your continued patience :)
I’ll jump on your other PR after lunch.
Laurent Francioli
@lfrancioli
Aug 03 2016 15:25
awesome, thanks! I’ll checkout the comments! For this PR, should I still operate on the main repo?
cseed
@cseed
Aug 03 2016 15:27
No, please close this PR and reopen one from your forked repo.
Laurent Francioli
@lfrancioli
Aug 03 2016 15:27
OK!
cseed
@cseed
Aug 03 2016 15:28
Thanks!
Daniel King
@danking
Aug 03 2016 16:15
Is there a style recommendation on unnecessary type annotations for lambda arguments?
cseed
@cseed
Aug 03 2016 16:17
No, in general I think we leave them off but don’t mind them for readability. IntelliJ makes it very easy (Control-Shift-P) get get the type of a variable, so having it in the code isn’t really necessary.
Daniel King
@danking
Aug 03 2016 16:17

e.g. I'm writing this:

  def getPartition(parts: Int) : Gen[Array[Int]] = Gen { p => partition(p.rng, p.size, parts) }

But throughout Gen.scala I see:

(p: Parameters) => ...
cseed
@cseed
Aug 03 2016 16:17
I do prefer type declarations on all top-level and object-level declarations, including return types from functions.
Tim Poterba
@tpoterba
Aug 03 2016 16:18
so the second?
Daniel King
@danking
Aug 03 2016 16:18
Agreed, but what about the lambdas that are passed to Gen?
cseed
@cseed
Aug 03 2016 16:19
p => shows up a lot, too. It’s totally inconsistent. I’d be happy making them all p =>.
Daniel King
@danking
Aug 03 2016 16:19
:thumbsup:
Daniel King
@danking
Aug 03 2016 16:55
@cseed I think the builds aren't failing when they ought to, e.g.: http://52.91.58.64:7299/job/HailTest/job/PR-536/6/console
cseed
@cseed
Aug 03 2016 16:56
This is a rebase issue with master. Tim just changed the association of (v, va, gs) to (v, (va, gs)) (to make the RDD in VSM a “pair RDD”).
Tim Poterba
@tpoterba
Aug 03 2016 16:57
that build has a green checkmark on the github PR
cseed
@cseed
Aug 03 2016 16:57
Rebasing with master should fix it. Remember, the Jenkins PR builder automatically merges with master.
Tim Poterba
@tpoterba
Aug 03 2016 16:57
no, the problem isn’t that gradle is failing
it’s that jenkins is giving it a thumbs up even though it fails
cseed
@cseed
Aug 03 2016 16:58
Oh! I see. Ah, hmm. Well, that is bad.
I see the problem.
This is a bug in the build script. I’m running gradle | tee to capture the build scan URL but I’m losing the error code.
I have a fix, one moment.
Is the current master failing beause of this?
Tim Poterba
@tpoterba
Aug 03 2016 17:02
no
cseed
@cseed
Aug 03 2016 17:03
OK. I need to go get lunch, I will test the fix afterwards.
(answer: use set -o pipefail with tee)
Tim Poterba
@tpoterba
Aug 03 2016 17:05
the last commit has a successful build
Laurent Francioli
@lfrancioli
Aug 03 2016 18:15
@cseed : When squashing commits prior to a pull request, do you want all changes as a single commit? e.g. in the case of the head functionality ?
Jon Bloom
@jbloom22
Aug 03 2016 18:17
yes, example format:
added sort, added ascending parameter to sortBy
- added sort on Arrays in expr
- extended sort and sortBy to take an optional Boolean parameter for ascending
- modifed behavior to always place null values at the end
- updated HailExpressionLanguage.md
- added tests for sort and sortBy to ExprSuite
Laurent Francioli
@lfrancioli
Aug 03 2016 18:18
All right!
Jon Bloom
@jbloom22
Aug 03 2016 18:18
first line should be short, high level. the list enumerates the changes in more detail. Then GitHub displays the rest nicely automatically:
broadinstitute/hail#511
fyi, if you past plain text in triple backtick, use ```text at the top
Tim Poterba
@tpoterba
Aug 03 2016 18:20
or ```markdown
good to know
Laurent Francioli
@lfrancioli
Aug 03 2016 18:24
OK great, thanks !
cseed
@cseed
Aug 03 2016 18:59
The Jenkinsfile fix should be good. It was detecting test case failures correctly (I’m not sure how, probably parsing the output), but it wasn’t propagating the exit code on compile failures. I tested both cases and it is working now.
Also, since the new Jenkinsfile goes into master, all the branches automatically get tested with the proper configuration. Oh, so sweet!
Mitja Kurki
@Fedja
Aug 03 2016 21:17
Have you had problems getting gradle daemon to run? Probably after osx update the daemon is not run with —daemon. It promises during execution that subsequent compilation would be faster but they are not.
cseed
@cseed
Aug 03 2016 21:18
wmf3c-fc5:hail cseed$ gradle clean
Starting a new Gradle Daemon for this build (subsequent builds will be faster).
Using a seed of [1] for testing.
:clean

BUILD SUCCESSFUL

Total time: 4.31 secs
wmf3c-fc5:hail cseed$ gradle clean
Using a seed of [1] for testing.
:clean UP-TO-DATE

BUILD SUCCESSFUL

Total time: 0.63 secs
I’m running gradle 2.14 installed with brew.
Jon Bloom
@jbloom22
Aug 03 2016 21:21
brew installed 2.12 when i just upgraded, did you manually point to 2.14 or did you mean 2.12?
cseed
@cseed
Aug 03 2016 21:21
Nope, 2.14.
Make sure you brew update first to update the available recipes.
Although the daemon thing has been a feture for a long time. Maybe it was broken in 2.12?
Mitja Kurki
@Fedja
Aug 03 2016 21:23
I have 2.6
hmm
haven’t updated in long time… I wonder if they have changed the versioning scheme at some point
update it is then
just wanted to check if there is some known glitch
Jon Bloom
@jbloom22
Aug 03 2016 21:26

Arg, I’m getting

jbloom$ brew update
Error: /usr/local must be writable!

and sudo fails as well. I recall this happening before, and then I reinstalled brew and all the rest

drwxr-xr-x 20 root wheel 680 Jul 26 20:43 local
cseed
@cseed
Aug 03 2016 21:27
Yeah, my brew is all fucked up, the same.
Tim Poterba
@tpoterba
Aug 03 2016 21:27
sudo chown -R $USER /usr/local/
Jon Bloom
@jbloom22
Aug 03 2016 21:29
go Tim!
Laurent Francioli
@lfrancioli
Aug 03 2016 22:05
I feel like I keep asking the same question, but given a BaseType (e.g. TDouble) and a Querier, what’s the best way of getting the value in the corresponding Scala Type (e.g. Double) ?
cseed
@cseed
Aug 03 2016 22:06
So you have an x: Any wrapping a Double and t = TDouble and you want a Double? t match { case TDouble => x.asInstanceOf[Double] }
Laurent Francioli
@lfrancioli
Aug 03 2016 22:08
right
Tim Poterba
@tpoterba
Aug 03 2016 22:08
querier(va).map(_.asInstanceOf[Double]) if you’ve checked the type
cseed
@cseed
Aug 03 2016 22:08
Just for the record, this all makes me throw up in my mouth a little.
Tim Poterba
@tpoterba
Aug 03 2016 22:09
:poop:
Laurent Francioli
@lfrancioli
Aug 03 2016 22:09
So I just had to implement this:
def getStructFieldType(baseType: BaseType) : DataType = {
      baseType match {
        case TDouble => DoubleType
        case TInt => IntegerType
        case TString => StringType
        case TFloat => FloatType
        case TBoolean => BooleanType
        case TLong => LongType
        case _ => NullType //TODO FIXME this is probably not optimal :)
      }
    }
Tim Poterba
@tpoterba
Aug 03 2016 22:09
we have this already
Laurent Francioli
@lfrancioli
Aug 03 2016 22:09
ahhh...
Tim Poterba
@tpoterba
Aug 03 2016 22:09
SparkAnnotationImpex.exportType
Laurent Francioli
@lfrancioli
Aug 03 2016 22:09
where?
dude!
cseed
@cseed
Aug 03 2016 22:10
Impex stands for Im(port)Ex(port). We should probably rename it ImpEx.
What Tim said. We have Impex for JSON, Spark, TextTables (TSV), Cassandra and Solr.
Laurent Francioli
@lfrancioli
Aug 03 2016 22:11
OK great
impex is fairly unintuitive
but glad you pointed it out!
cseed
@cseed
Aug 03 2016 22:15
Good catch, Tim. That’s the last time I merge a commit without review.
Terrible name, I know. Alternate suggestions welcome.
Tim Poterba
@tpoterba
Aug 03 2016 22:16
are we doing bad name contests? my submission is my new class OrderedLeftPartitionKeyJoinRDD
cseed
@cseed
Aug 03 2016 22:16
Esp. bad becuase the partition key is on the right. :-)
Tim Poterba
@tpoterba
Aug 03 2016 22:17
OrderedLeftJoinRightPartitionKeyRDD
much better
cseed
@cseed
Aug 03 2016 22:17
OLJRPKRDD?
Hahaha.
Laurent Francioli
@lfrancioli
Aug 03 2016 22:19
hahaha
cseed
@cseed
Aug 03 2016 22:21
Gotta run. Later Hailers!
Laurent Francioli
@lfrancioli
Aug 03 2016 22:22
I seem to be stuck cause I have a BaseType and not a Type...
This message was deleted
This message was deleted
Tim Poterba
@tpoterba
Aug 03 2016 22:24
how did you get it?
query?
Laurent Francioli
@lfrancioli
Aug 03 2016 22:24
yes
Tim Poterba
@tpoterba
Aug 03 2016 22:24
query should be type
Laurent Francioli
@lfrancioli
Aug 03 2016 22:24
yes
Tim Poterba
@tpoterba
Aug 03 2016 22:24
hmm
Laurent Francioli
@lfrancioli
Aug 03 2016 22:24
no
Tim Poterba
@tpoterba
Aug 03 2016 22:25
ah this is from queryVA or something?
yeah, that should totally return Type instead of BaseType
Laurent Francioli
@lfrancioli
Aug 03 2016 22:25
//Get queriers
    val queriers = options.features.split(",")
      .map( {f =>
        val q = state.vds.queryVA(f)
        (f,q._1,q._2)
      })

    val schema = StructType(
      queriers.map({ case (f,t,q) => StructField(f,t.,true)})
    )
yes
Tim Poterba
@tpoterba
Aug 03 2016 22:25
you can just cast it for now
want to make an issue to fix that?
Laurent Francioli
@lfrancioli
Aug 03 2016 22:26
sure
I’m wondering if the cast will have the right behavior though
Tim Poterba
@tpoterba
Aug 03 2016 22:27
yeah
there’s no way currently to get a BaseType out of a queryVA
afaik, the only BaseType not a Type is TAggregable
Laurent Francioli
@lfrancioli
Aug 03 2016 22:28
OK
I’ll try casting