These are chat archives for broadinstitute/hail

8th
Aug 2016
Laurent Francioli
@lfrancioli
Aug 08 2016 18:25
Has anyone already tried the Variant.toRow method in an SQL context?
Tim Poterba
@tpoterba
Aug 08 2016 18:25
what do you mean?
Laurent Francioli
@lfrancioli
Aug 08 2016 18:26
I’m getting a scala.matchError when using the schema I got from Variant.schema
Tim Poterba
@tpoterba
Aug 08 2016 18:26
can you post the code you’re trying to run?
Laurent Francioli
@lfrancioli
Aug 08 2016 18:27
val schema = StructType(
      Variant.schema.fields ++
        Array(StructField(options.training.replace(".","_"), StringType, true)) ++
        feature_queriers.map({ case (f, t, q) => StructField(f, t.asInstanceOf[Type].schema, true) })
    )

    //Convert RDD records to Row
     val rowRDD = state.vds.rdd.map({
       case (v, va, gs) =>  Row(
         v.toRow.toSeq ++
           Array( label_querier._2(va) match {
             case Some(x) => x.asInstanceOf[Boolean].toString
             case None => null
           })
           ++ feature_queriers.map({
           case (f, t, q) => q(va) match {
             case Some(x) => t match {
               case TDouble => x.asInstanceOf[Double]
               case TInt => x.asInstanceOf[Int]
               case TString => x.asInstanceOf[String]
               case TFloat => x.asInstanceOf[Float]
               case TLong => x.asInstanceOf[Long]
               case TBoolean => x.asInstanceOf[Boolean]
               case _ => x.toString //TODO FIXME this whole thing is ugly!
             }
             case None => null
           }
         })
       )
     })
val vaSchemaDF = sqlContext.createDataFrame(rowRDD,schema)
 info ("#variants before filtering: %d".format(vaSchemaDF.count()))
let me add 2 more lines so we get to the point where the exception happens
Tim Poterba
@tpoterba
Aug 08 2016 18:27
why do you have to flatten the schema?
Laurent Francioli
@lfrancioli
Aug 08 2016 18:28
I can’t find another way of appending more data to a Row
is there?
And this is the error I get: scala.MatchError: ArrayBuffer(7, 148701017, T, ArrayBuffer([T,C]), null, 12.62, 59.86, -0.292, 7.014, -2.0E-4)
Tim Poterba
@tpoterba
Aug 08 2016 18:30
I think you’re constructing a row wrong
try Row.fromSeq(…)
  def apply(values: Any*): Row = new GenericRow(values.toArray)
  def fromSeq(values: Seq[Any]): Row = new GenericRow(values.toArray)
Laurent Francioli
@lfrancioli
Aug 08 2016 18:31
If I comment out the part about Variant I get what I expect
Right, so this would potentially be a better way of constructing the part with the queriers?
ah no, I see what you mean
yeah, let me try that
Tim Poterba
@tpoterba
Aug 08 2016 18:32
why are you making things strings?
Laurent Francioli
@lfrancioli
Aug 08 2016 18:33
the label needs to be a String, the rest is just my way of no handling the stuff I don’t care about for now :)
Tim Poterba
@tpoterba
Aug 08 2016 18:33
ahh
Laurent Francioli
@lfrancioli
Aug 08 2016 18:33
As my comment indicates, not the right way though
As usual — you were right!
using Row.fromSeq(…) solved it!
Thanks!
Tim Poterba
@tpoterba
Aug 08 2016 18:37
yeah, it put an extra layer of Seq on it when you just did the apply
Laurent Francioli
@lfrancioli
Aug 08 2016 18:38
Ahhh, got it!
not so intuitive though