Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Erik Garrison
    @ekg
    You were right on
    Travis Collier
    @travc

    Can you think of a decent filter along those lines for a multi-sample vcf?
    I'm still stuck on the problem of trying to determine reliably called vs uncalled for each sample at each loci. I could always fallback to a simple depth filter, but it seems like there must be something better... Though now that I look at it, DP, RO, and QR seem to be the only FMT values available to do genotype filtering on.

    PS: There are two use-cases here. One is trying to generate per-sample sequences (fasta) without defaulting to the ref. The second (more interesting one) is computing absolute diversity and divergence metrics, which is where joint calling is also really useful.

    Zev Kronenberg
    @zeeev
    @ekg is master building
    vector<Allele>& altAlleles = inputVariantAlleles[currentVariant->position - 1];
    i'm getting errors there
    Erik Garrison
    @ekg
    It seems OK to me. I can try again.
    Zev Kronenberg
    @zeeev
    it probably is
    Kirill Tsyganov
    @serine

    @ekg or anyone else. I'm calling SNPs and InDels using freebayes on amplicon (targeted sequecing) data. We know the region where polymorphic events should occur. freebayes call SNP event nicely, but completely misses insertion event (we know this from Sanger seq and IGV). I have 200k reads covering that short region ( ~300 bases). I though that, maybe, such high coverage could be the problem and so I tried to subsample my fastq files using 0.5 and 0.1 fractions using seqtk to see if I was right, but that didn't help - no insert was called in either of the attempts.

    I tried gatk -T HaplotypeCaller and it called both events, although it called GA -> CG event as two separate onces. The problem with HaplotypeCaller it seems to find more significant events elsewhere in the genome. We do have reads mapped elsewhere in the genome, but to very small numbers e.g 100-200 reads at few other regions elsewhere in the genome.

    I'm just curios if you have any suggestions. Also I wasn't sure it this is the right place to hit you with the question. Let me know if you think BioStars is better for this discussion.

    Thanks

    Kirill Tsyganov
    @serine
    Also, most of the tutorials suggest to remove duplicates, but in my case most reads are "duplicates", so removing dups doesn't apply in this case. Will that some how mess with freebayes and in general with SNPs and InDels calling?
    Erik Garrison
    @ekg
    What command line are you using for freebayes?
    Erik Garrison
    @ekg
    @serine it really could be affecting the process if the minimum alternate fraction is set at the default
    How many reads support the indel?
    Kirill Tsyganov
    @serine
    freebayes -f ../refFiles/Mus_musculus.GRCm38.dna_sm.primary_assembly.fa P-C18-36_sample_0.1_sorted.bam > P-C18-36_freebayes_sample_0.1.vcf
    i don't know exact number of reads supporting insert..but from eyeballing IGV fair bit, more than 100 for sure, but I'm guessing it'll actually be something like more than 1000. I just don't know how to grab those reads with the insert
    this is the command I used to sub-sample reads seqtk sample ~/lustre/raw-data/blahblah/P-C18-36_S12_L001_R1_001.fastq.gz 0.1 > P-C18-36_R1_0.1.fastq I think seqtk does random sampling
    Kirill Tsyganov
    @serine
    I align raw-reads with bwa mem default parameters
    Kirill Tsyganov
    @serine
    I got it ha, nice work on freebayes :D. I didn't realised about freebayes fractionnig (-F) option. I set it to 10 % (-F 0.1) and an insert got called nicely. So to answer previous question it must be at least 2000 reads that support the insert. I do need to figure out best approach though, cause I have close to 50 different samples with variable coverage..should I always run it with 10% fraction? I did think to suggest to the research in future run sequencing at a lower depth. Anyway this is for me to figure out. Thanks heaps for help
    Actually just one more question.
    GA -> CG event was always called with DP=180298
    TA  -> TACCTTCCGGA this event got called when I set -F 0.1, however this event is covered by more reads DP=184919
    Erik Garrison
    @ekg
    Great this worked out! I think we should drop the default threshold!! Tests suggest this will be fine but we need to do more to be sure. What exactly is your last question?
    Kirill Tsyganov
    @serine
    @ekg I don't understand how does -F switch works. I posted more in depth description with some data in your google group (only just found out about it :) ) https://groups.google.com/forum/#!topic/freebayes/DJB1NYdcK7E
    many thanks for help again
    Lance Parsons
    @lparsons
    I'm looking at some data from pooled populations of bacteria. I've run with the --haplotype-length 0 --min-alternate-count 1 --min-alternate-fraction 0 --pooled-continuous --report-monomorphic options. I'd now like to filter on allele frequency in the population (AO/RO). Is there a simple way to do this, or do I need to script a bit to pull out those values, calculate the frequency, and the filter? Thanks.
    Erik Garrison
    @ekg
    vcffilter -f "AO / RO > 0.1"
    Lance Parsons
    @lparsons
    Ah, thanks. Somehow I thought I tried that and got an error. Will try again.
    Lance Parsons
    @lparsons
    Worked great, not sure what I had done wrong before, thanks!
    BTW, might be nice to include in the docs an example that uses a computation like that, just so people know that basic arithmetic operators are allowed.
    Isaac Hodes
    @ihodes
    hey all! i'm trying to get bamleftalign working on an ~80GB RNA BAM (using the checkout at v1.1.0), but for the past ~2 hours it looks like nothing's happening (0% CPU, stdout has emitted nothing).
    my invocation looks like "./freebayes/bin/bamleftalign -d -f b37decoy.fasta -c rna.bam > rna-leftaligned.bam"
    Taylor Paisie
    @taylorpaisie
    Hi everyone! Does anyone use freebayes in Galaxy?
    Lance Parsons
    @lparsons
    @taylorpaisie Yes, I use freebayes via Galaxy at times.
    Stuart
    @sbyma
    have you attempted to compile freebayes using -std=c++11 ?
    *has anyone
    Taylor Paisie
    @taylorpaisie
    I’m using freebayes to call variants in a vibrio cholerae dataset. I’m having trouble finding a good vcf filtering tool for hqSNPs. We are having trouble replicating a previous lab member’s protocol. I’m really interested in any techniques that utilize machine learning as well. Any suggestions?
    Brad Langhorst
    @bwlang
    I'm trying to get freebayes working with data aligned to the full GRCh38 set (including HLA contigs). freebayes is not handling this well as a result of the contig names containing ':' e.g. HLA-DRB116:02:01 (they get trucated and freebayes says: "unable to find FASTA index entry for 'HLA-DRB116'" I've just created a test case for this and was about try to make a fix - but I thought I check first, is this considered a bug?
    lucidv01d
    @lucidv01d
    @sbyma, I did today, but have been running into problems
    Switching to gcc version 4.8.1 helped with some of the compile errors, but I'm still getting others
    lucidv01d
    @lucidv01d
    Update: gcc/g++ version 4.8.1 was used to test build the latest version of freebayes ( commit: 961e5f3...). I have multiple versions installed on the machine I'm compiling on (where I do not have admin rights), so I was modifying the compiler variables in src/Makefile to point to 4.8.1. Some submodules didn't compile this way, so I added the location of 4.8.1 gcc/g++ binaries to the front of my PATH and this worked. I'd recommend trying this if you are running into gcc/g++ version issues, which are related to the -std=c++11/-std=c++0x errors
    Brad Langhorst
    @bwlang
    @lucidv01d conda can help with these kinds of issues too
    Taylor Paisie
    @taylorpaisie
    With the “QUAL” column in my vcf file output from freebayes, I have very larger numbers (in the thousands). Does that just mean the base call accuracy is very high?
    shibuvp
    @shibuvp
    is it possible to create vcf from a adam file(contains parquet) using freebayes
    Niru Chennagiri
    @cvniru
    Can freebayes output all the candidate haplotypes and the post-haplotype-assembly BAMs?
    sunthedeep
    @sunthedeep
    Why does freebayes output single nucleotide variants with multiple bases? For example, one line in my vcf output says "MyAmplicon 862 . CCTAG CCTAT 321617 .", where I would expect "MyAmplicon 866 . G T 321617 ." Is there some parameter I need to change to fix this?
    Taylor Paisie
    @taylorpaisie
    Does anyone know a good way to extract a codon alignment from a vcf file output from freebayes? I’m able to get a SNP fasta alignment, but I would like to extract the codons from those SNPs in a fasta format.
    Janney12
    @Janney12
    how to deal with many bam files at the same time
    Taylor Paisie
    @taylorpaisie
    @Janney12 freebayes lets you use a text file as the input for all the bam files you want to call variants on
    forestdussault
    @forestdussault
    Hi, just wanted to make a bug report. I was getting the "Unable to find FASTA index entry" error in freebayes (v1.1.0-46-g8d2b3a0) and resolved it by changing the header of my FASTA file. The original header that was causing the error was ">gi|110645304|ref|NC_002516|pseudocap|136 [Pseudomonas aeruginosa PAO1 chromosome, complete genome.]". I changed it to ">PSEUDOMONAS", re-ran my pipeline, and freebayes ran without any issue. I guess it's having difficulty parsing some of the characters.
    Duarte
    @duartemolha
    Is there a simple way of getting freebayes to output as AF the observed read frequency of the genotype call instead of the theoretical values of a diploid human? AF is always 1 0.5 of 0 even though by variants had clear biases of the allele count (for example in heterozygot duplications and deletions.
    pinninti19
    @pinninti19
    Hi, how to decide the --genotype-qualities in freebayes ?
    pinninti19
    @pinninti19
    Hi, how to decide the --genotype-qualities in freebayes ? is their min and max genotype quality ? How much i can choose ? The dataset - 1000g exome dataset aligner - BWA, Bowtie2, Novoalign,Cushaw3 Thanks!
    Lennard Berger
    @Fohlen
    @taylorpaisie a high quality score is very unlikely. You should look at the https://en.wikipedia.org/wiki/Phred_quality_score