Patrik, I'm outsider to the project who is someone that wants to learn, I have noticed that there are information for similar columns in the vcf files generated by V-pipe, please check your V-pipe output folders for vcf files and check the INFO lines at the beginning of the vcf files.
It looks some of the VCF and the CSV files are generated using this code: https://github.com/cbg-ethz/shorah/blob/master/src/shorah/shorah_snv.py, please see lines from 440 to 472 where the VCF INFO lines are processed
Hi, for the columns of the CSV file:
Chromosome,Pos,Ref,Var,Frq1,Frq2,Frq3,Pst1,Pst2,Pst3,Fvar,Rvar,Ftot,Rtot,Pval,Qval NC_045512.2,19164,C,T,0.1819,0.2618,-,1.0000,1.0000,- NC_045512.2,24323,A,C,0.3804,0.4025,0.4484,1.0000,1.0000,1.0000
ShoRAH in shotgun mode splits the genome in overlapping regions (see parameters
-w window size and
-s shifts between windows). For each variant called during the SNV phase, shorah looks at three overlapping window from the diri_sampler phase to confirm the variant. At least two windows are required to call a variant.
Frq...: frequency at which the variant got observed in that window.
(fractional number in the 0.0 - 1.0 range)
Pst...: posterior probability of the variant in that window.
( == equivalent to the haplotype's posterior in "support/")
F/R...: the fil step works by comparing the proportion between the
forward and reverse mapped reads and tries to look for a bias toward strands in one direction.
(In a paired-ends double strand sequencing, a real SNV should be roughly seen in similar proportion in both direction. Whereas a sequencing error could appear in one direction but would be completely missing out of the reverse direction).
...var: number of ocurence of the variant in each respective direction
...tot: total number of reads at that position.
Pval: is the P-value for the question "is there a strand bias?"
Qval: (explanations by Osvaldo)
If the forward/reverse reads ratio for a variant is deviating from the overall ratio "too much", then the p-value is low, i.e. there is strand bias and you cannot trust the variant call.
Nevertheless, since we are running multiple tests (one for each variant call), the chance of false positive "triggers" (deviations from the expected ratio -> low p-value -> rejection of the variant call even if it should be accepted) is fairly high. In order to mitigate this, correction for multiple tests is invoked, specifically: Benjamini-Hochberg.
More here https://xkcd.com/882/