Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Daniel King
    @danking
    ok
    I’ll investigate
    Laurent Francioli
    @lfrancioli
    wait, are you sure there are 1k partitions ?
    Daniel King
    @danking
    is there anything inbetween the import table & write and the block above?
    Laurent Francioli
    @lfrancioli
    Laurent:hail2 laurent$ gsutil ls gs://gnomad/variant_qc/temp/friedman_cnn_scores.no_chr17.ht/rows/parts/
    gs://gnomad/variant_qc/temp/friedman_cnn_scores.no_chr17.ht/rows/parts/
    gs://gnomad/variant_qc/temp/friedman_cnn_scores.no_chr17.ht/rows/parts/part-0-2-0-0-4c1e0ba4-7fc3-be60-93ab-7160eaea2afa
    gtiao
    @gtiao
    def main():
        hl.init(log='/variantqc.log')
    
        ht = hl.import_table('gs://gnomad/variant_qc/temp/friedman_cnn_scores.tsv.gz', force_bgz=True, min_partitions=1000, impute=True)
        ht.write('gs://gnomad/variant_qc/temp/friedman_cnn_scores.no_chr17.ht', overwrite=True)
    
        ht = hl.read_table('gs://gnomad/variant_qc/temp/friedman_cnn_scores.no_chr17.ht')
        ht = ht.annotate(alt_alleles=ht.Alt.split(','))  # This transforms to a list
        ht = ht.explode('alt_alleles')
        ht = ht.annotate(locus=hl.locus(hl.str(ht.Contig), ht.Pos))
    
        # Apply minrep
        ht = ht.annotate(alleles=hl.min_rep(ht.locus, [ht.Ref, ht.alt_alleles])[1])
    
        # Add variant_type
        ht = ht.annotate(vartype=add_variant_type(ht.alleles))
        ht = ht.transmute(variant_type=ht.vartype.variant_type, n_alt_alleles=ht.vartype.n_alt_alleles)
    
        # Add rank
        print('Adding rank...')
        ht = add_rank(ht)
        ht.key_by('locus', 'alleles').write('gs://gnomad/variant_qc/temp/friedman_cnn_scores.no_chr17.ranked.ht', overwrite=True)
    Laurent Francioli
    @lfrancioli
    Not as familliar with the new Hail data format, but shouldn't there be 1k parts in there?
    gtiao
    @gtiao
    Where add_rank() is the function that contains the order_by code
    Daniel King
    @danking
    uh
    Konrad Karczewski
    @konradjk
    yeah, might need to set the min_block_size
    Daniel King
    @danking
    there should be more than one anyway
    Konrad Karczewski
    @konradjk
    hl.init(min_block_size=0)
    @lfrancioli is correct
    Laurent Francioli
    @lfrancioli
    Isn't the default min_block_size 1Mb ?
    from the doc that's what I see
    but maybe the doc isn't accurate :)
    Daniel King
    @danking
    it’s definitely 1 and definitely measured in MB
    I just checked
    gtiao
    @gtiao
    Why does min block size matter here?
    Laurent Francioli
    @lfrancioli
    because the number of partitions depends on min_block_size (minimum partition size) and min_partitions (minimum number of partitions)
    Konrad Karczewski
    @konradjk
    oh didn't realize it was 1 mb, i thought it was larger
    but even then yeah that doesn't explain it
    is min_partitions in import_table not working?
    Laurent Francioli
    @lfrancioli
    that could explain it :)
    Daniel King
    @danking
    @gtiao it looks like you might be loading all the data into one partition?
    gtiao
    @gtiao
    I thought that was what I was trying to avoid by ht = hl.import_table('gs://gnomad/variant_qc/temp/friedman_cnn_scores.tsv.gz', force_bgz=True, min_partitions=1000, impute=True)
    Daniel King
    @danking
    min_block_size controls how small input file blocks can be
    I also think that would avoid it.
    gtiao
    @gtiao
    Maybe force_bgz doesn’t work with min_partitions?
    It is actually bgzipped
    Daniel King
    @danking
    lemme spin up a little cluster to poke at that file
    Daniel King
    @danking
    blah. we need to make 0.2 the default such a long turn around time if I forget
    Daniel King
    @danking
    so its definitely being loaded in one partition
    which is obviously bad and wrong, and I’ll try to figure out why
    gtiao
    @gtiao
    OK, cool — thanks for looking into it!
    Daniel King
    @danking
    @gtiao you’re on latest master right?
    Daniel King
    @danking
    @gtiao yeah it’s force_bgz being broken somehow, if you can rename the file that will bypass the issue for now
    gtiao
    @gtiao
    Great — I will do that. I’ve been using a Konrad jar (gs://konradk/jars/hail-6d4d50458.jar) but I don’t recall what the specific issue was that we were trying to address with that
    Konrad Karczewski
    @konradjk
    am i to presume that if i see this in my file:
     'gs://gnomad/annotations/hail-0.2/ht/exomes/gnomad.exomes.family_stats.ht/rows/parts/part-04249-15-4249-0-0e67bae1-c1d2-5e25-aad1-eb8c419bbdbe',
     'gs://gnomad/annotations/hail-0.2/ht/exomes/gnomad.exomes.family_stats.ht/rows/parts/part-04249-15-4249-1-8e47c4bd-6382-0449-a9c9-7f3d29ea1511',
    that the later one is the correct one?
    klaricch
    @klaricch
    any thoughts how to follow up on the error below? ld_prune had worked on an exome matrix table but then I joined it with data from an array matrix table and lost a lot of variants and kept only GT as an entry field. not sure if i need to skip that join.
    mm_test = hl.ld_prune(mm.GT,r2=0.1)
    FatalError: ArrayIndexOutOfBoundsException: 6
    
    Java stack trace:
    java.lang.ArrayIndexOutOfBoundsException: 6
        at is.hail.methods.LocalLDPrune$.apply(LocalLDPrune.scala:294)
        at is.hail.methods.LocalLDPrune.apply(LocalLDPrune.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
    
    Hail version: devel-15eaf7588401
    Error summary: ArrayIndexOutOfBoundsException: 6
    cseed
    @cseed
    @klaricch It looks like a bug in our end. Can you open an issue on the github repo? Thanks!
    It looks like it should be straightforward to fix.
    maccum
    @maccum
    hail-is/hail#3735 should fix that bug once it goes in @klaricch
    klaricch
    @klaricch
    ok thanks!
    Laurent Francioli
    @lfrancioli
    I'm having troubles with AssertionError on Tables:
    ht.describe()
    ----------------------------------------
    Global fields:
        None
    ----------------------------------------
    Row fields:
        'v1_idx': int32 
        'v2_idx': int32 
    ----------------------------------------
    Key: ['v1_idx', 'v2_idx']
    ----------------------------------------
    ht.show()
    Traceback (most recent call last):
      File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-53-73b5a6c78295>", line 1, in <module>
        ht.show()
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/typecheck/check.py", line 547, in wrapper
        return f(*args_, **kwargs_)
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/table.py", line 1169, in show
        print(self._show(n,width, truncate, types))
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/table.py", line 1172, in _show
        return self._jt.showString(n, joption(truncate), types, width)
      File "/Users/laurent/tools/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
        answer, self.gateway_client, self.target_id, self.name)
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/utils/java.py", line 196, in deco
        'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
    hail.utils.java.FatalError: AssertionError: assertion failed
    Java stack trace:
    java.lang.AssertionError: assertion failed
        at scala.Predef$.assert(Predef.scala:156)
        at is.hail.expr.ir.TypeCheck$.apply(TypeCheck.scala:78)
        at is.hail.expr.ir.TypeCheck$.apply(TypeCheck.scala:7)
        at is.hail.expr.ir.Emit$.emit(Emit.scala:42)
        at is.hail.expr.ir.Emit$.apply(Emit.scala:28)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:49)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:31)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:62)
        at is.hail.expr.TableExplode.execute(Relational.scala:2201)
        at is.hail.expr.TableUnkey.execute(Relational.scala:1883)
        at is.hail.expr.TableMapRows.execute(Relational.scala:2090)
        at is.hail.expr.TableKeyBy.execute(Relational.scala:1846)
        at is.hail.expr.TableMapRows.execute(Relational.scala:2090)
        at is.hail.table.Table.value$lzycompute(Table.scala:243)
        at is.hail.table.Table.value(Table.scala:238)
        at is.hail.table.Table.x$5$lzycompute(Table.scala:246)
        at is.hail.table.Table.x$5(Table.scala:246)
        at is.hail.table.Table.rvd$lzycompute(Table.scala:246)
        at is.hail.table.Table.rvd(Table.scala:246)
        at is.hail.table.Table.take(Table.scala:961)
        at is.hail.table.Table.showString(Table.scala:1002)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
    Hail version: devel-10a75bb57a6f
    Error summary: AssertionError: assertion failed
    clues?
    gtiao
    @gtiao
    If I’ve left out a header on an imported file, can I add colnames by doing table_result = table1.rename({'C1' : ‘newcolname1', 'C2' : ‘newcolname2’})?
    maccum
    @maccum
    @lfrancioli that’s a bug. can you make an issue in github?