Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Thomas Dyar
    quick question: does the SNAP alignment in avocado run in parallel over a single sample within a single input BAM file? Wondering if using avocado for alignment / preprocessing will help turnaround time for our per-run qc pipeline?
    Allen Day
    I'm getting this error from avocado: "java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to org.bdgenomics.formats.avro.NucleotideContigFragment" using config
    Allen Day
    I see from the parquet metadata that the reference chromosomes are of class NucleotideContigFragment.
    so I interpret this to mean that the reads, which are of class AlignmentRecord, are being converted at some point to GenericData$Record, which sounds erroneously generic.
    anyone else seen this? any idea, @fnothaft ?
    Andrew Chen
    hi allen! what command are you running to get this?
    Allen Day
    I am updated to head on git for avocado
    problem shows up if I invoke like this:
    /path/to/avocado-submit /path/to/MT.bam.adam /path/to/human_g1k_v37.fasta.adam /path/to/out-avocado /path/to/avocado-sample-configs/
    however, if I use unconverted data, like:
    /path/to/avocado-submit /path/to/MT.bam /path/to/human_g1k_v37.fasta /path/to/out-avocado /path/to/avocado-sample-configs/
    Andrew Chen
    have you seen the response in the ADAM gitter? I think it may be because you're using an old reference .adam file.
    Allen Day
    ok, making some progress on this. it looks like you're right, the latest pull from adam repo and rebuilding old .adam files at least gets me past that error.
    Luca Pireddu
    Hello people. Is anyone having problems running avocado on large-ish datasets?
    though I've had success with small input, I haven't been able to get it to successfully complete a job on anything larger than about 20 GB
    Erin Jerri Pangilinan
    does anyone actually use apache drill over apache spark for anything in big data genomics? i was thinking not. there’s a training in mapR next week on it (very introductory), familiar with spark but not w/ drill, would like to know folks’ thoughts here on what actual practitioners use and prefer and why
    Erin Jerri Pangilinan
    ah nm, seems just like another add-on
    Khaled Nasri
    can you share avocado-submit command?
    17/07/06 16:51:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on (size: 30.3 KB, free: 366.3 MB)
    17/07/06 16:51:00 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at ADAMContext.scala:376
    17/07/06 16:51:00 WARN BiallelicGenotyper: Input RDD is not persisted. Performance may be degraded.
    Command body threw exception:
    java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
    17/07/06 16:51:00 INFO BiallelicGenotyper: Overall Duration: 3.41 secs
    Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
    at org.bdgenomics.avocado.genotyping.DiscoverVariants$.variantsInRdd(DiscoverVariants.scala:83)
    at org.bdgenomics.avocado.genotyping.DiscoverVariants
    KaTeX parse error: Unexpected character: '$' at position 7: anonfun̲$apply$1.apply: anonfun$apply$1.apply(DiscoverVariants.scala:56)
            at org.bdgenomics.avocado.genotyping.DiscoverVariants
    at scala.Option.fold(Option.scala:158)
    at org.apache.spark.rdd.Timer.time(Timer.scala:48)
    at org.bdgenomics.avocado.genotyping.DiscoverVariants$.apply(DiscoverVariants.scala:54)
    at org.bdgenomics.avocado.genotyping.BiallelicGenotyper$.discoverAndCall(BiallelicGenotyper.scala:153)
    at org.bdgenomics.avocado.cli.BiallelicGenotyper
    KaTeX parse error: Unexpected character: '$' at position 7: anonfun̲$4.apply(Biall: anonfun$4.apply(BiallelicGenotyper.scala:228)
            at org.bdgenomics.avocado.cli.BiallelicGenotyper
    at scala.Option.fold(Option.scala:158)
    at org.bdgenomics.utils.cli.BDGSparkCommand$
    at org.bdgenomics.avocado.cli.AvocadoMain
    KaTeX parse error: Unexpected character: '$' at position 7: anonfun̲$run$3.apply(A: anonfun$run$3.apply(AvocadoMain.scala:75)
            at org.bdgenomics.avocado.cli.AvocadoMain
    at scala.Option.fold(Option.scala:158)
    at org.bdgenomics.avocado.cli.AvocadoMain$.main(AvocadoMain.scala:26)
    at org.bdgenomics.avocado.cli.AvocadoMain.main(AvocadoMain.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    17/07/06 16:51:00 INFO SparkContext: Invoking stop() from shutdown hook
    17/07/06 16:51:00 INFO ServerConnector: Stopped ServerConnector@84bbff{HTTP/1.1}{}
    17/07/06 16:51:00 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6db66836{/stages/stage/kill,null,UNAVAILABLE}
    17/07/06 16:51:00 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@3574e198{/jobs/job/kill,null,UNAVAILABLE}
    17/07/06 16:51:00 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@27e0f2f5{/api,null,UNAVAILABLE}
    17/07/06 16:51:00 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@9cd25ff{/,null,UNAVAILABLE}
    17/07/06 16:51:00 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@69f63d95{/static,null,UNAVAILABLE}
    17/07/06 16:51:00 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@660e9100{/executors/threadDump/json,null,UNAVAILABLE}
    17/07/06 16:51:00 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6928f576{/executors/threadDump,null,UNAVAILABL
    Peter van 't Hof

    I have some questions about this file

    Why are all methods here private? This way I can't use it as a library inside a full in-memory pipeline
    maybe I'm missing a special api file? ;)