Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Konrad Karczewski
    table joins are still not quite as fast as MT
    So keep till it breaks?
    Konrad Karczewski
    Is there a way to join two VCFs in Hail 0.2?
    outer join
    basically, merge
    Konrad Karczewski
    outer join on what?
    samples, variants, or both?
    Konrad Karczewski
    i don't think so
    outer on one, inner on the other is possible with union_cols and union_rows
    Is there a technical reason why you can’t do outer on both?
    “you” in the general sense, of course
    outer join requires some knowledge of sample non-carrier status (missing vs homozygous ref) for variants seen in one VDS but not the other .. not possible w/out going back to BAMs, incorporating the TileDB storage, or supporting gVCF .. not sure where those are on the Hail roadmap
    Daniel King
    handling gVCF is on the medium-ish term roadmap
    @gtiao there’s engineering work involved in building a good interface for that. last I heard patrick was doing some work in that direction
    Hooray — that’s functionality that would be extremely useful for us to have
    On an unrelated note, I have a pipeline that’s still running but has emitted a warning:
    WARNING: Failed to fetch GCS output:
    HttpError accessing <https://www.googleapis.com/storage/v1/b/dataproc-35495967-f2a1-4c0d-8ce0-a135afa728d4-us/o/google-cloud-dataproc-metainfo%2F41281f99-e03a-442a-a365-b8a50e8b50c7%2Fjobs%2F5dc3117ca5a24e2a937161a318173653%2Fdriveroutput.000000001?alt=json>: response: <{'status': '503', 'content-length': '0', 'expires': 'Thu, 14 Jun 2018 18:29:08 GMT', 'server': 'UploadServer', 'cache-control': 'private, max-age=0', 'date': 'Thu, 14 Jun 2018 18:29:08 GMT', 'alt-svc': 'quic=":443"; ma=2592000; v="43,42,41,39,35"', 'content-type': 'text/html; charset=UTF-8', 'x-guploader-uploadid': 'AEnB2UpYfilLrAR9zFvO8ZXEDsQUg5bgDHvlMdzA1wPpyFtUW2COEUasz_1KHKzEGZ1OJWSHazUTQUELyHCPMjh7O9kQqy04kQ'}>, content <>
    Is this bad?
    Konrad Karczewski
    that looks like it's probably a matter of your connection to google (and its reporting of the job output)
    its probably fine
    What’s the fastest and easiest way of generating a vector of length n random uniform values? I know about hl.rand_unif(0,1) but I want a whole bunch of them, not just one
    Tim Poterba
    probably hl.range(0, N).map(lambda _: hl.rand_unif(0, 1))
    as we develop the numerical stuff and add NDArrays as a Hail type, we'll probably add the ability to do this like numpy
    Excellent — thanks!
    Tim Poterba
    Hi all,
    Hail Gitter chat is now deprecated. Please continue the conversation at https://hail.zulipchat.com !
    Jatin Sandhuria

    I tried running ld_pruning method on b38 1 thousand genome data but got an assertion error.
    Below is the script and the error.

    th_gn_mt = hl.read_matrix_table("/path")
    th_gn_biallelic_mt = th_gn_mt.filter_rows(th_gn_mt.alleles.length() == 2 )
    def unphase_mt(mt: hl.MatrixTable) -> hl.MatrixTable:
        return mt.annotate_entries(GT=hl.case()
                                   .when(mt.GT.is_diploid(), hl.call(mt.GT[0], mt.GT[1], phased=False))
                                   .when(mt.GT.is_haploid(), hl.call(mt.GT[0], phased=False))
    th_gn_biallelic_mt = unphase_mt(th_gn_biallelic_mt)
    th_gn_ld_pruned_mt = hl.ld_prune(th_gn_biallelic_mt.GT,r2=0.2,bp_window_size=500000)
    2018-06-25 09:27:47 Hail: INFO: ld_prune: running local pruning stage with max queue size of 99274 variants
    2018-06-25 09:43:59 Hail: INFO: wrote 13911578 items in 1509 partitions
    2018-06-25 09:45:04 Hail: INFO: wrote 13911766 items in 1509 partitions to hdfs://prod-scc/tmp/hail.kIhhSGEG1wBW/sgOZm5mSHU
    2018-06-25 09:45:04 Hail: INFO: ld_prune: local pruning stage retained 13911766 variants
    2018-06-25 09:45:42 Hail: INFO: Wrote all 3397 blocks of 13911766 x 2504 matrix with block size 4096.
    2018-06-25 09:45:46 Hail: INFO: Coerced almost-sorted dataset
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/GWD/RDIP/apps/hail_pipeline_api/hail_0.2/hail-python_6942d09.zip/hail/typecheck/check.py", line 547, in wrapper
      File "/GWD/RDIP/apps/hail_pipeline_api/hail_0.2/hail-python_6942d09.zip/hail/methods/statgen.py", line 3034, in ld_prune
      File "/GWD/RDIP/apps/hail_pipeline_api/hail_0.2/hail-python_6942d09.zip/hail/typecheck/check.py", line 547, in wrapper
      File "/GWD/RDIP/apps/hail_pipeline_api/hail_0.2/hail-python_6942d09.zip/hail/linalg/blockmatrix.py", line 708, in _filtered_entries_table
      File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
      File "/GWD/RDIP/apps/hail_pipeline_api/hail_0.2/hail-python_6942d09.zip/hail/utils/java.py", line 196, in deco
    hail.utils.java.FatalError: AssertionError: assertion failed
    Java stack trace:
    java.lang.AssertionError: assertion failed
            at scala.Predef$.assert(Predef.scala:156)
            at is.hail.methods.UpperIndexBounds$.computeCoverByUpperTriangularBlocks(UpperIndexBounds.scala:63)
            at is.hail.linalg.BlockMatrix.filteredEntriesTable(BlockMatrix.scala:1198)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
            at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
            at py4j.Gateway.invoke(Gateway.java:280)
            at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
            at py4j.commands.CallCommand.execute(CallCommand.java:79)
            at py4j.GatewayConnection.run(GatewayConnection.java:214)
            at java.lang.Thread.run(Thread.java:745)
    Hail version: devel-6942d090d618
    Error summary: AssertionError: assertion failed

    This is on chr1 to chr 22 data.

    Tim Poterba
    @jatin-sandhuria Hi all,
    Hail Gitter chat is now deprecated. Please continue the conversation at https://hail.zulipchat.com !
    (I know you already posted there, just want this to be last message)