These are chat archives for broadinstitute/hail

4th
Aug 2016
Tim Poterba
@tpoterba
Aug 04 2016 02:21
@cseed: got ideas for good orderedRDD benchmarks?
right now I’m importing the EIGEN table (9b, 44g compressed) and annotating with itself
cseed
@cseed
Aug 04 2016 13:53
I prefer benchmarks that are real use cases. Self-annotation also is the best case scenario for ordered join since there is no excess computation. I’d try a full VEP pipeline, annotation on a real dataset (ExAC sites file, say) or Jiwoo’s big annotation import.
We’re going to have to start thinking about performance benchmarks as part of the tests soon.
Daniel King
@danking
Aug 04 2016 15:30
So the type Annotation is defined as Any, this causes the type inferencer to pick the wrong type some times. Why does Annotation need to be Any?
Tim Poterba
@tpoterba
Aug 04 2016 15:30
annotations are totally flexible / nested
an annotation could be a struct with a bunch of values, or just an int
the type inference only fails in IntelliJ, it compiles just fine
that also gives us a nice interface for handling them:
package object annotations {

  class AnnotationPathException(msg: String = "") extends Exception(msg)

  type Annotation = Any

  type Deleter = (Annotation) => Annotation

  type Querier = (Annotation) => Option[Any]

  type Inserter = (Annotation, Option[Any]) => Annotation

  type Assigner = (Annotation, Option[Any]) => Annotation

  type Merger = (Annotation, Annotation) => Annotation

  type Filterer = (Annotation) => Annotation
}
actually, all the Anys there should also be Annotations
Daniel King
@danking
Aug 04 2016 15:33
What I mean by failure is that the type inferencer picks the wrong type (in my case, Array instead of Set) without explicit annotations on the calls to Gen.buildableOf
I've been introducing higher kinder-types in an attempt to reduce the number of explicit type annotations
Laurent Francioli
@lfrancioli
Aug 04 2016 15:35
Totally agree with Any should be Annotation…I’ve been somewhat confused when looking at the signature of these functions at first
Laurent Francioli
@lfrancioli
Aug 04 2016 15:47
What’s the best way of getting a Variant from the String “1:12345:A:T” ?
Tim Poterba
@tpoterba
Aug 04 2016 15:47
in what context
Laurent Francioli
@lfrancioli
Aug 04 2016 15:48
just within the code
Tim Poterba
@tpoterba
Aug 04 2016 15:48
TableAnnotationImpex can do it
Laurent Francioli
@lfrancioli
Aug 04 2016 15:49
OK
found it, thanks!
Would it make sense to move this logic to the Variant object though?
Tim Poterba
@tpoterba
Aug 04 2016 15:51
depends if we’re doing this a lot
shouldn’t be going Variant -> String -> Variant outside of table import/export
Laurent Francioli
@lfrancioli
Aug 04 2016 15:53
In my case I’m going RDD of Variant, … to DataFrameor Row with a schema that needs to be of DataType
maybe can find another way of getting the Variant into it
and then back to an RDD that I want to join
I can also join on v.toString though
Tim Poterba
@tpoterba
Aug 04 2016 15:54
don’t go through String
go through the variant row
variant.toRow with Variant.schema
Laurent Francioli
@lfrancioli
Aug 04 2016 15:55
good point!
yeah, that’s actually much better, thanks!
Tim Poterba
@tpoterba
Aug 04 2016 15:55
:smiley:
Laurent Francioli
@lfrancioli
Aug 04 2016 17:30
Is there already a StructType => Type ?
somewhere
Laurent Francioli
@lfrancioli
Aug 04 2016 17:38
Nevermind, found it!
I’ll start getting good with Impex, I swear :)
Mitja Kurki
@Fedja
Aug 04 2016 20:03
Would you have suggestion for a git workflow. I have my own fork from broad/hail and I need stuff that is not in there from laurent/hail. What would be a good way to keep my dev branch up-to-date with the changes from both upstreams when looking into submitting pull request some time in the future?
Tim Poterba
@tpoterba
Aug 04 2016 20:04
you should always rebase onto broad/hail
we won’t accept PRs that have merge commits
(you’ll have to rebase those when you make a PR)
this is a good question though.
Mitja Kurki
@Fedja
Aug 04 2016 20:14
Yea… rebasing for sane history is a must. Squashing merge commits into one before PR? I have diverged quite a lot and rebasing is a bit of a mess currently
Tim Poterba
@tpoterba
Aug 04 2016 20:14
if you’re going to submit a PR, you probably want to submit it as 1 commit
ask @jigold how to do this, she recently did it for the bgen branch
Mitja Kurki
@Fedja
Aug 04 2016 20:21
ok thanks.
Mitja Kurki
@Fedja
Aug 04 2016 22:35
Any idea what might be causing this suddenly when running dist hail locally. ../dev_branches/hail/build/install/hail/bin/hail importvcf ../dev_branches/hail/src/test/resources/raft.vcf Exception in thread "main" java.net.BindException: Failed to bind to: /10.17.160.173:0: Service 'sparkDriver' failed after 16 retries! at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
Tim Poterba
@tpoterba
Aug 04 2016 22:35
are you on vpn?
Mitja Kurki
@Fedja
Aug 04 2016 22:36
running locally on my laptop
Tim Poterba
@tpoterba
Aug 04 2016 22:36
yeah but are you on the broad VPN?
Mitja Kurki
@Fedja
Aug 04 2016 22:36
nope
Tim Poterba
@tpoterba
Aug 04 2016 22:36
weird
Mitja Kurki
@Fedja
Aug 04 2016 22:36
how would that affect it?
Tim Poterba
@tpoterba
Aug 04 2016 22:36
I get this error when I’m on the vpn
it has something to do with the localhost binding name
Mitja Kurki
@Fedja
Aug 04 2016 22:37
ah…
I’m attached to ethernet inside MGH network
Tim Poterba
@tpoterba
Aug 04 2016 22:38
that could have something to do with it
I don’t know though
@cseed: my experiment is annotating a profile225 VDS with an exac sites VDS, then exportvariants, on my laptop.
current master: exportvariants: 8m27.5s
ordered branch: exportvariants: 8.366s
Jon Bloom
@jbloom22
Aug 04 2016 22:42
60x :)
Tim Poterba
@tpoterba
Aug 04 2016 22:42
yep
Mitja Kurki
@Fedja
Aug 04 2016 22:47
Seems to be fixed when switching to wifi…. damn weird. Do you have any idea how is it trying to choose the spark master url?
Tim Poterba
@tpoterba
Aug 04 2016 22:47
nope
Mitja Kurki
@Fedja
Aug 04 2016 22:48
thanks anyway… would not have thought about that ever
Tim Poterba
@tpoterba
Aug 04 2016 22:48
hehe
Tim Poterba
@tpoterba
Aug 04 2016 23:26
@cseed: any thoughts about back-compatibility of the ordered stuff?
easy to implement so that it can read old VDS files, but people may never read/write then