Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Michael L Heuer
@heuermh
Michael L Heuer
@heuermh
FriederikeHanssen
@FriederikeHanssen
:thumbsup: Great, so it will be workig in the next release, right? :tada:
Michael L Heuer
@heuermh
Hopefully! I'll be testing those pull requests on AWS EMR today
Michael L Heuer
@heuermh
@FriederikeHanssen For your use case, you would be accessing CRAM files staged by Nextflow and mounted on Singularity, correct? How would the CRAM reference (.fa and .fai) be accessed?
In other words, would all of those be on local disk?
FriederikeHanssen
@FriederikeHanssen
@heuermh the cram will be staged per nextflow. The reference files are mounted via nfsmount
Larry N. Singh
@larryns
I'm having some trouble with the python version of adam. The latest version I can pull from pip is 0.31.0. Is the 0.33.0 version not available for python? Thanks.
Michael L Heuer
@heuermh
@larryns Yes, the release process for python and R has run into issues lately. I thought only 0.33.0 was missing from pypi but it appears 0.32.0 is missing as well. I am trying to get 0.33.0 up this week, and if successful, I'll go back and push 0.32.0.
Note the upcoming 0.34.0 release will have some incompatible changes due to bigdatagenomics/adam#2296 and fixes for bigdatagenomics/adam#2134 and bigdatagenomics/adam#2171
Larry N. Singh
@larryns
@heuermh okay, thank you!
Michael L Heuer
@heuermh
@larryns Version 0.33.0 was successfully pushed to pypi and a colleague says it checks out, hope it works for you!
Larry N. Singh
@larryns
@heuermh much thanks! Will let you know if I have issues.
Tanveer Ahmad
@tahashmi
Does ADAM have some API to convert a Spark Dataframe to SAM/BAM format file? Thanks.
Tanveer Ahmad
@tahashmi

Does ADAM have some API to convert a Spark Dataframe to SAM/BAM format file? Thanks.

Supposing my Dataframe has SAM data.

somiron
@somiron
Hi everyone. Does anyone know if it's possible to use HDFS for the CRAM reference genome set through hadoopbam.cram.reference-source-path ?
Larry N. Singh
@larryns
@tahashmi Something along the lines of the following should work:
// Given an adamcontext variable: ac, and a dataframe: df and that df is in the right format.
ac.loadAlignments(df)
    .saveAsSam("filename.bam", asType=Some(SAMFormat.BAM), asSingleFile=true)
Larry N. Singh
@larryns
Note that this will only work if your dataframe can be converted properly to AlignmentDataset. If you're using the htsjdk.samtools.SAMRecord objects, you can convert these to Alignment records with the SAMRecordConverter class. Of course, this approach assumes that you're using the scala API. I'm not an expert by any means, but this is the approach I would take.
Michael L Heuer
@heuermh
@tahashmi @larryns Yes, this is correct, your dataframe should have a schema (column names) as close to that of AlignmentDataset as possible for this to work. You can use e.g. Spark SQL to view/remap columns as necessary.

@somiron I have had some trouble with this in the past (bigdatagenomics/adam#1993), I haven't tried recently. Part of the problem is that Hadoop-BAM is essentially a dead project.

It is possible to use Disq to load CRAM and then convert to ADAM format, though the code in this particular repo took some classpath juggling to get to work:
https://github.com/heuermh/benchmarks/blob/master/cram/convert_cram_disq_adam.scala

somiron
@somiron
Thanks for the reply @heuermh . How do I then use the reference genome in ADAM format?
Michael L Heuer
@heuermh
CRAM references on HDFS are supposed to be supported by Hadoop-BAM, so it may work. I was never able to identify what was causing the error I saw above. You may also make the CRAM reference available on local disk
somiron
@somiron
The documentation for Hadoop-BAM is indeed saying that referencing hdfs should work. I've checked the source code and I don't see support for HDFS implemented. In the end, I copied the reference genome to HDFS and used adoop-fuse-dfs to mount HDFS locally. This seems to work fine.
Michael L Heuer
@heuermh
Good to know, thanks! It would be nice to replace Hadoop-BAM with Disq entirely at some point, but I am not convinced it would be worth the effort. I imagine rather Disq --> ADAM conversions will be added via a new module that looks something like the benchmarking scripts above.
Tanveer Ahmad
@tahashmi
Thank you so much @larryns and @heuermh
Michael L Heuer
@heuermh
FYI, I've created such a new module for using Disq to load BAM, CRAM, and VCF
https://github.com/heuermh/disq-adam
I'm not sure yet where best it should live, in the adam repository, in another repo under the bigdatagenomics organization, or in another repo under the disq-bio organization.
hxdhan
@hxdhan

when I use this code to generate sam file

val fragments = sc.loadFastq("36_11.fastq", Option("36_22.fastq"))
fragments.saveAsSam("test.sam", SAMFormat.SAM, true, false)
I got

@hd VN:1.6 SO:unsorted
SRR8418436.2 2 77 0 0 0 0 GGAAATTTAAAAAAATACACATGGCCAGGCCCCAGCCCAAATCACTAATAAGAATCTCCAGGGCTTCACCTGTTAGACTGGCAAAAATCCAAAAGTAAACA @@@bda:?FHHHDHGFHBF9A?EGE@FHGBG@DHGE;;B2=@GIDGDCFHHIBGHIIEEFC8?@b(.;CA>>>;ACC<<?CCB9?9::>:>@C:@@cc>
SRR8418436.2 2 141
0 0 0 0 CACCAGCAATGTGTAGGAATACCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTT ?@BDDABCCBFDHA9F@E@FHDEBEHI@DEBDAHIII1:D@GB@FHIB?@??DGDHGCGB==C>;F@@C=C;??E=CBB?;ACACA35@@

but when I use another tool (example picard) I got
@hd VN:1.6 SO:queryname
@rg ID:A SM:1
SRR8418436.2 77 0 0 0 0 GGAAATTTAAAAAAATACACATGGCCAGGCCCCAGCCCAAATCACTAATAAGAATCTCCAGGGCTTCACCTGTTAGACTGGCAAAAATCCAAAAGTAAACA @@@bda:?FHHHDHGFHBF9A?EGE@FHGBG@DHGE;;B2=@GIDGDCFHHIBGHIIEEFC8?@b(.;CA>>>;ACC<<?CCB9?9::>:>@C:@@cc> RG:Z:A
SRR8418436.2 141
0 0 0 0 CACCAGCAATGTGTAGGAATACCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTT ?@BDDABCCBFDHA9F@E@FHDEBEHI@DEBDAHIII1:D@GB@FHIB?@??DGDHGCGB==C>;F@@C=C;??E=CBB?;ACACA35@@ RG:Z:A

I think there are extra column
SRR8418436.2 2
---------------^

I want to know how to do to omit this column.

Michael L Heuer
@heuermh
Thanks, @hxdhan. Replied on issue bigdatagenomics/adam#2321