Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Daniel King
    @danking
    so its definitely being loaded in one partition
    which is obviously bad and wrong, and I’ll try to figure out why
    gtiao
    @gtiao
    OK, cool — thanks for looking into it!
    Daniel King
    @danking
    @gtiao you’re on latest master right?
    Daniel King
    @danking
    @gtiao yeah it’s force_bgz being broken somehow, if you can rename the file that will bypass the issue for now
    gtiao
    @gtiao
    Great — I will do that. I’ve been using a Konrad jar (gs://konradk/jars/hail-6d4d50458.jar) but I don’t recall what the specific issue was that we were trying to address with that
    Konrad Karczewski
    @konradjk
    am i to presume that if i see this in my file:
     'gs://gnomad/annotations/hail-0.2/ht/exomes/gnomad.exomes.family_stats.ht/rows/parts/part-04249-15-4249-0-0e67bae1-c1d2-5e25-aad1-eb8c419bbdbe',
     'gs://gnomad/annotations/hail-0.2/ht/exomes/gnomad.exomes.family_stats.ht/rows/parts/part-04249-15-4249-1-8e47c4bd-6382-0449-a9c9-7f3d29ea1511',
    that the later one is the correct one?
    klaricch
    @klaricch
    any thoughts how to follow up on the error below? ld_prune had worked on an exome matrix table but then I joined it with data from an array matrix table and lost a lot of variants and kept only GT as an entry field. not sure if i need to skip that join.
    mm_test = hl.ld_prune(mm.GT,r2=0.1)
    FatalError: ArrayIndexOutOfBoundsException: 6
    
    Java stack trace:
    java.lang.ArrayIndexOutOfBoundsException: 6
        at is.hail.methods.LocalLDPrune$.apply(LocalLDPrune.scala:294)
        at is.hail.methods.LocalLDPrune.apply(LocalLDPrune.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
    
    Hail version: devel-15eaf7588401
    Error summary: ArrayIndexOutOfBoundsException: 6
    cseed
    @cseed
    @klaricch It looks like a bug in our end. Can you open an issue on the github repo? Thanks!
    It looks like it should be straightforward to fix.
    maccum
    @maccum
    hail-is/hail#3735 should fix that bug once it goes in @klaricch
    klaricch
    @klaricch
    ok thanks!
    Laurent Francioli
    @lfrancioli
    I'm having troubles with AssertionError on Tables:
    ht.describe()
    ----------------------------------------
    Global fields:
        None
    ----------------------------------------
    Row fields:
        'v1_idx': int32 
        'v2_idx': int32 
    ----------------------------------------
    Key: ['v1_idx', 'v2_idx']
    ----------------------------------------
    ht.show()
    Traceback (most recent call last):
      File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-53-73b5a6c78295>", line 1, in <module>
        ht.show()
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/typecheck/check.py", line 547, in wrapper
        return f(*args_, **kwargs_)
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/table.py", line 1169, in show
        print(self._show(n,width, truncate, types))
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/table.py", line 1172, in _show
        return self._jt.showString(n, joption(truncate), types, width)
      File "/Users/laurent/tools/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
        answer, self.gateway_client, self.target_id, self.name)
      File "/Users/laurent/tools/hail-release/devel/hail.zip/hail/utils/java.py", line 196, in deco
        'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
    hail.utils.java.FatalError: AssertionError: assertion failed
    Java stack trace:
    java.lang.AssertionError: assertion failed
        at scala.Predef$.assert(Predef.scala:156)
        at is.hail.expr.ir.TypeCheck$.apply(TypeCheck.scala:78)
        at is.hail.expr.ir.TypeCheck$.apply(TypeCheck.scala:7)
        at is.hail.expr.ir.Emit$.emit(Emit.scala:42)
        at is.hail.expr.ir.Emit$.apply(Emit.scala:28)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:49)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:31)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:62)
        at is.hail.expr.TableExplode.execute(Relational.scala:2201)
        at is.hail.expr.TableUnkey.execute(Relational.scala:1883)
        at is.hail.expr.TableMapRows.execute(Relational.scala:2090)
        at is.hail.expr.TableKeyBy.execute(Relational.scala:1846)
        at is.hail.expr.TableMapRows.execute(Relational.scala:2090)
        at is.hail.table.Table.value$lzycompute(Table.scala:243)
        at is.hail.table.Table.value(Table.scala:238)
        at is.hail.table.Table.x$5$lzycompute(Table.scala:246)
        at is.hail.table.Table.x$5(Table.scala:246)
        at is.hail.table.Table.rvd$lzycompute(Table.scala:246)
        at is.hail.table.Table.rvd(Table.scala:246)
        at is.hail.table.Table.take(Table.scala:961)
        at is.hail.table.Table.showString(Table.scala:1002)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
    Hail version: devel-10a75bb57a6f
    Error summary: AssertionError: assertion failed
    clues?
    gtiao
    @gtiao
    If I’ve left out a header on an imported file, can I add colnames by doing table_result = table1.rename({'C1' : ‘newcolname1', 'C2' : ‘newcolname2’})?
    maccum
    @maccum
    @lfrancioli that’s a bug. can you make an issue in github?
    Laurent Francioli
    @lfrancioli
    :thumbsup:
    Konrad Karczewski
    @konradjk
    @lfrancioli what was your pipeline?
    Laurent Francioli
    @lfrancioli
        def variant_pairs_ht(mt, row_groups):
            mt = mt.add_row_index(name='_row_idx')
            mt = mt.add_col_index(name='_col_idx')
    
            mt = mt.group_rows_by(*row_groups).aggregate(
                variant_pairs_entry=hl.set(
                    hl.agg.collect(
                        hl.tuple([mt._col_idx, mt._row_idx])
                    )
                        # [col_idx, [row_idx]]
                        .group_by(lambda x: x[0])
                        # [[col_idx, row_idx), (col_idx, row_idx), ...],  ...]
                        .values()
                        # [[row_idx, row_idx, ...],  ...]
                        .map(lambda x: x.map(lambda y: y[0]))
                        .flatmap(lambda x: hl.range(0, hl.len(x))
                                   .flatmap(lambda i1: hl.range(i1 + 1, hl.len(x))
                                            .map(lambda i2: hl.tuple([x[i1], x[i2]]))))
                )
            )
    
            mt.describe()
    
            ht = mt.annotate_rows(
                variant_pairs=hl.agg.take(mt.variant_pairs_entry, 1)[0]
            ).rows()
    
            ht = ht.explode('variant_pairs')
            ht = ht.key_by(v1_idx=ht.variant_pairs[0], v2_idx=ht.variant_pairs[1])
            return ht.select()
    Konrad Karczewski
    @konradjk
    good lord
    gtiao
    @gtiao

    I just got a warning on a pipeline:

    UserWarning: The mt[<row keys>, :] syntax is deprecated, and will be removed before 0.2 release.
      Use one of the following instead:
        mt.rows()[<row keys>]
        mt.index_rows(<row keys>)
      ht = ht.annotate(info=hl.struct(AC=mt[ht.key, :].info.AC, a_index=mt[ht.key, :].a_index))

    Does this mean the new syntax should read:

    ht = ht.annotate(info=hl.struct(AC=mt.rows()[ht.key].info.AC, a_index=mt.rows()[ht.key].a_index))

    ?

    Konrad Karczewski
    @konradjk
    yes, but don't do that yet
    gtiao
    @gtiao
    Why not?
    Konrad Karczewski
    @konradjk
    table joins are still not quite as fast as MT
    gtiao
    @gtiao
    So keep till it breaks?
    Konrad Karczewski
    @konradjk
    yes
    gtiao
    @gtiao
    Is there a way to join two VCFs in Hail 0.2?
    outer join
    basically, merge
    Konrad Karczewski
    @konradjk
    outer join on what?
    samples, variants, or both?
    gtiao
    @gtiao
    Both
    Konrad Karczewski
    @konradjk
    i don't think so
    outer on one, inner on the other is possible with union_cols and union_rows
    gtiao
    @gtiao
    Is there a technical reason why you can’t do outer on both?
    “you” in the general sense, of course
    jkeebler
    @jkeebler
    outer join requires some knowledge of sample non-carrier status (missing vs homozygous ref) for variants seen in one VDS but not the other .. not possible w/out going back to BAMs, incorporating the TileDB storage, or supporting gVCF .. not sure where those are on the Hail roadmap
    Daniel King
    @danking
    handling gVCF is on the medium-ish term roadmap
    @gtiao there’s engineering work involved in building a good interface for that. last I heard patrick was doing some work in that direction
    gtiao
    @gtiao
    Hooray — that’s functionality that would be extremely useful for us to have
    On an unrelated note, I have a pipeline that’s still running but has emitted a warning:
    WARNING: Failed to fetch GCS output:
    HttpError accessing <https://www.googleapis.com/storage/v1/b/dataproc-35495967-f2a1-4c0d-8ce0-a135afa728d4-us/o/google-cloud-dataproc-metainfo%2F41281f99-e03a-442a-a365-b8a50e8b50c7%2Fjobs%2F5dc3117ca5a24e2a937161a318173653%2Fdriveroutput.000000001?alt=json>: response: <{'status': '503', 'content-length': '0', 'expires': 'Thu, 14 Jun 2018 18:29:08 GMT', 'server': 'UploadServer', 'cache-control': 'private, max-age=0', 'date': 'Thu, 14 Jun 2018 18:29:08 GMT', 'alt-svc': 'quic=":443"; ma=2592000; v="43,42,41,39,35"', 'content-type': 'text/html; charset=UTF-8', 'x-guploader-uploadid': 'AEnB2UpYfilLrAR9zFvO8ZXEDsQUg5bgDHvlMdzA1wPpyFtUW2COEUasz_1KHKzEGZ1OJWSHazUTQUELyHCPMjh7O9kQqy04kQ'}>, content <>
    Is this bad?
    Konrad Karczewski
    @konradjk
    that looks like it's probably a matter of your connection to google (and its reporting of the job output)
    its probably fine
    gtiao
    @gtiao
    What’s the fastest and easiest way of generating a vector of length n random uniform values? I know about hl.rand_unif(0,1) but I want a whole bunch of them, not just one
    Tim Poterba
    @tpoterba
    probably hl.range(0, N).map(lambda _: hl.rand_unif(0, 1))
    as we develop the numerical stuff and add NDArrays as a Hail type, we'll probably add the ability to do this like numpy
    gtiao
    @gtiao
    Excellent — thanks!