Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Daniel King
    @danking
    I just checked
    gtiao
    @gtiao
    Why does min block size matter here?
    Laurent Francioli
    @lfrancioli
    because the number of partitions depends on min_block_size (minimum partition size) and min_partitions (minimum number of partitions)
    Konrad Karczewski
    @konradjk
    oh didn't realize it was 1 mb, i thought it was larger
    but even then yeah that doesn't explain it
    is min_partitions in import_table not working?
    Laurent Francioli
    @lfrancioli
    that could explain it :)
    Daniel King
    @danking
    @gtiao it looks like you might be loading all the data into one partition?
    gtiao
    @gtiao
    I thought that was what I was trying to avoid by ht = hl.import_table('gs://gnomad/variant_qc/temp/friedman_cnn_scores.tsv.gz', force_bgz=True, min_partitions=1000, impute=True)
    Daniel King
    @danking
    min_block_size controls how small input file blocks can be
    I also think that would avoid it.
    gtiao
    @gtiao
    Maybe force_bgz doesn’t work with min_partitions?
    It is actually bgzipped
    Daniel King
    @danking
    lemme spin up a little cluster to poke at that file
    Daniel King
    @danking
    blah. we need to make 0.2 the default such a long turn around time if I forget
    Daniel King
    @danking
    so its definitely being loaded in one partition
    which is obviously bad and wrong, and I’ll try to figure out why
    gtiao
    @gtiao
    OK, cool — thanks for looking into it!
    Daniel King
    @danking
    @gtiao you’re on latest master right?
    Daniel King
    @danking
    @gtiao yeah it’s force_bgz being broken somehow, if you can rename the file that will bypass the issue for now
    gtiao
    @gtiao
    Great — I will do that. I’ve been using a Konrad jar (gs://konradk/jars/hail-6d4d50458.jar) but I don’t recall what the specific issue was that we were trying to address with that
    Konrad Karczewski
    @konradjk
    am i to presume that if i see this in my file:
     'gs://gnomad/annotations/hail-0.2/ht/exomes/gnomad.exomes.family_stats.ht/rows/parts/part-04249-15-4249-0-0e67bae1-c1d2-5e25-aad1-eb8c419bbdbe',
     'gs://gnomad/annotations/hail-0.2/ht/exomes/gnomad.exomes.family_stats.ht/rows/parts/part-04249-15-4249-1-8e47c4bd-6382-0449-a9c9-7f3d29ea1511',
    that the later one is the correct one?
    klaricch
    @klaricch
    any thoughts how to follow up on the error below? ld_prune had worked on an exome matrix table but then I joined it with data from an array matrix table and lost a lot of variants and kept only GT as an entry field. not sure if i need to skip that join.
    mm_test = hl.ld_prune(mm.GT,r2=0.1)
    FatalError: ArrayIndexOutOfBoundsException: 6
    
    Java stack trace:
    java.lang.ArrayIndexOutOfBoundsException: 6
        at is.hail.methods.LocalLDPrune$.apply(LocalLDPrune.scala:294)
        at is.hail.methods.LocalLDPrune.apply(LocalLDPrune.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
    
    Hail version: devel-15eaf7588401
    Error summary: ArrayIndexOutOfBoundsException: 6
    cseed
    @cseed
    @klaricch It looks like a bug in our end. Can you open an issue on the github repo? Thanks!
    It looks like it should be straightforward to fix.
    maccum
    @maccum
    hail-is/hail#3735 should fix that bug once it goes in @klaricch
    klaricch
    @klaricch
    ok thanks!
    Laurent Francioli
    @lfrancioli
    I'm having troubles with AssertionError on Tables:
    ht.describe()
    ----------------------------------------
    Global fields:
        None
    ----------------------------------------
    Row fields:
        'v1_idx': int32 
        'v2_idx': int32 
    ----------------------------------------
    Key: ['v1_idx', 'v2_idx']
    ----------------------------------------
    ht.show()
    Traceback (most recent call last):
      File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-53-73b5a6c78295>", line 1, in <module>
        ht.show()
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/typecheck/check.py", line 547, in wrapper
        return f(*args_, **kwargs_)
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/table.py", line 1169, in show
        print(self._show(n,width, truncate, types))
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/table.py", line 1172, in _show
        return self._jt.showString(n, joption(truncate), types, width)
      File "/Users/laurent/tools/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
        answer, self.gateway_client, self.target_id, self.name)
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/utils/java.py", line 196, in deco
        'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
    hail.utils.java.FatalError: AssertionError: assertion failed
    Java stack trace:
    java.lang.AssertionError: assertion failed
        at scala.Predef$.assert(Predef.scala:156)
        at is.hail.expr.ir.TypeCheck$.apply(TypeCheck.scala:78)
        at is.hail.expr.ir.TypeCheck$.apply(TypeCheck.scala:7)
        at is.hail.expr.ir.Emit$.emit(Emit.scala:42)
        at is.hail.expr.ir.Emit$.apply(Emit.scala:28)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:49)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:31)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:62)
        at is.hail.expr.TableExplode.execute(Relational.scala:2201)
        at is.hail.expr.TableUnkey.execute(Relational.scala:1883)
        at is.hail.expr.TableMapRows.execute(Relational.scala:2090)
        at is.hail.expr.TableKeyBy.execute(Relational.scala:1846)
        at is.hail.expr.TableMapRows.execute(Relational.scala:2090)
        at is.hail.table.Table.value$lzycompute(Table.scala:243)
        at is.hail.table.Table.value(Table.scala:238)
        at is.hail.table.Table.x$5$lzycompute(Table.scala:246)
        at is.hail.table.Table.x$5(Table.scala:246)
        at is.hail.table.Table.rvd$lzycompute(Table.scala:246)
        at is.hail.table.Table.rvd(Table.scala:246)
        at is.hail.table.Table.take(Table.scala:961)
        at is.hail.table.Table.showString(Table.scala:1002)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
    Hail version: devel-10a75bb57a6f
    Error summary: AssertionError: assertion failed
    clues?
    gtiao
    @gtiao
    If I’ve left out a header on an imported file, can I add colnames by doing table_result = table1.rename({'C1' : ‘newcolname1', 'C2' : ‘newcolname2’})?
    maccum
    @maccum
    @lfrancioli that’s a bug. can you make an issue in github?
    Laurent Francioli
    @lfrancioli
    :thumbsup:
    Konrad Karczewski
    @konradjk
    @lfrancioli what was your pipeline?
    Laurent Francioli
    @lfrancioli
        def variant_pairs_ht(mt, row_groups):
            mt = mt.add_row_index(name='_row_idx')
            mt = mt.add_col_index(name='_col_idx')
    
            mt = mt.group_rows_by(*row_groups).aggregate(
                variant_pairs_entry=hl.set(
                    hl.agg.collect(
                        hl.tuple([mt._col_idx, mt._row_idx])
                    )
                        # [col_idx, [row_idx]]
                        .group_by(lambda x: x[0])
                        # [[col_idx, row_idx), (col_idx, row_idx), ...],  ...]
                        .values()
                        # [[row_idx, row_idx, ...],  ...]
                        .map(lambda x: x.map(lambda y: y[0]))
                        .flatmap(lambda x: hl.range(0, hl.len(x))
                                   .flatmap(lambda i1: hl.range(i1 + 1, hl.len(x))
                                            .map(lambda i2: hl.tuple([x[i1], x[i2]]))))
                )
            )
    
            mt.describe()
    
            ht = mt.annotate_rows(
                variant_pairs=hl.agg.take(mt.variant_pairs_entry, 1)[0]
            ).rows()
    
            ht = ht.explode('variant_pairs')
            ht = ht.key_by(v1_idx=ht.variant_pairs[0], v2_idx=ht.variant_pairs[1])
            return ht.select()
    Konrad Karczewski
    @konradjk
    good lord
    gtiao
    @gtiao

    I just got a warning on a pipeline:

    UserWarning: The mt[<row keys>, :] syntax is deprecated, and will be removed before 0.2 release.
      Use one of the following instead:
        mt.rows()[<row keys>]
        mt.index_rows(<row keys>)
      ht = ht.annotate(info=hl.struct(AC=mt[ht.key, :].info.AC, a_index=mt[ht.key, :].a_index))

    Does this mean the new syntax should read:

    ht = ht.annotate(info=hl.struct(AC=mt.rows()[ht.key].info.AC, a_index=mt.rows()[ht.key].a_index))

    ?

    Konrad Karczewski
    @konradjk
    yes, but don't do that yet
    gtiao
    @gtiao
    Why not?
    Konrad Karczewski
    @konradjk
    table joins are still not quite as fast as MT
    gtiao
    @gtiao
    So keep till it breaks?
    Konrad Karczewski
    @konradjk
    yes
    gtiao
    @gtiao
    Is there a way to join two VCFs in Hail 0.2?
    outer join
    basically, merge
    Konrad Karczewski
    @konradjk
    outer join on what?
    samples, variants, or both?
    gtiao
    @gtiao
    Both
    Konrad Karczewski
    @konradjk
    i don't think so