These are chat archives for dereneaton/ipyrad

19th
Jan 2019
tim-oconnor
@tim-oconnor
Jan 19 05:06
Sorry to add to the heap of questions, but I have a puzzling result when comparing reference and denovo+reference assemblies for the RADcap data I'm working with.
tim-oconnor
@tim-oconnor
Jan 19 05:15

Using the same version of ipyrad (v.0.7.28) and otherwise identical settings, I my reference assembly returns more (many more) loci than the denovo+reference. It actually appears that the denovo+reference is working as denovo-reference. Below is the params file for the reference-based assembly (ins-ins):

------- ipyrad params file (v.0.7.28)-------------------------------------------
ins-ins                        ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/global/scratch/toconnor/radcap/lane1/ipyrad ## [1] [project_dir]: Project dir (made in curdir if not present)
/global/scratch/toconnor/radcap/lane1/ins1*fastq.gz ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
/global/scratch/toconnor/radcap/lane1/ipyrad/ins1_bc.txt ## [3] [barcodes_path]: Location of barcodes file
                               ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
reference                      ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)
/global/scratch/toconnor/radcap/radnome.ins.fasta ## [6] [reference_sequence]: Location of reference sequence file
pair3rad                       ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
AATT, TGCAG, TGCAT,            ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
3                              ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33                             ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6                              ## [11] [mindepth_statistical]: Min depth for statistical base calling
3                              ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000                          ## [13] [maxdepth]: Max cluster depth within samples
0.9                            ## [14] [clust_threshold]: Clustering threshold for de novo assembly
2                              ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
1                              ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
80                             ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2                              ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
2, 2                           ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
8, 8                           ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
80                             ## [21] [min_samples_locus]: Min # samples per locus for output
20, 20                         ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
2, 2                           ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
0.2                            ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
0, 0, 0, 0                     ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0                     ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
G, a, g, k, m, l, n, p, s, u, t, v ## [27] [output_formats]: Output formats (see docs)
                               ## [28] [pop_assign_file]: Path to population assignment file

Aside from the assembly name and assembly type, the params file for the denovo+reference is identical. (I have quintuple checked that the assembly type is indeed denovo+reference and not denovo-reference).

But compare the results:
reference assembly

Summary stats of Assembly ins-ins
------------------------------------------------
                 state  reads_raw  reads_passed_filter  refseq_mapped_reads      ...        clusters_hidepth  hetero_est  error_est  reads_consens
E18-001              6     409336               408212                47521      ...                    7503    0.005111   0.000361           7423
E18-002              6     348024               347098                44068      ...                    7324    0.005344   0.000324           7243
E18-003              6     582426               580723                59146      ...                    7641    0.004272   0.000261           7570
E18-004              6     525881               524498                59077      ...                    7715    0.005299   0.000281           7636
E18-005              6    1965073              1959158               161456      ...                    8024    0.004609   0.000198           7961
E18-006              6     436475               435134                46063      ...                    7353    0.004388   0.000329           7264
E18-007              6     177001               176441                26121      ...                    6407    0.005170   0.000548           6307
E18-008              6     179501               178899                25521      ...                    6331    0.004731   0.000511           6243
E18-009              6     271484               270564                26721      ...                    4291    0.004932   0.000523           4209
E18-010              6     512928               511467                55625      ...                    7615    0.005511   0.000242           7531

denovo+reference assembly

Summary stats of Assembly ins-ins-denovo
------------------------------------------------
                 state  reads_raw  reads_passed_filter  refseq_mapped_reads    ...      clusters_total  clusters_hidepth  hetero_est  error_est
E18-001              3     409336               408212                47521    ...               19606                 0         NaN        NaN
E18-002              3     348024               347098                44068    ...               17514                 0         NaN        NaN
E18-003              4     582426               580723                59146    ...               24090              1305    0.010000   0.001000
E18-004              3     525881               524498                59077    ...               22704                 0         NaN        NaN
E18-005              4    1965073              1959158               161456    ...               60348              3602    0.001618   0.000014
E18-006              3     436475               435134                46063    ...               21478                 0         NaN        NaN
E18-007              4     177001               176441                26121    ...               15045               574    0.010000   0.001000
E18-008              4     179501               178899                25521    ...               15150               452    0.010000   0.001000
E18-009              4     271484               270564                26721    ...               16206               529    0.010000   0.001000
E18-010              4     512928               511467                55625    ...               22208               554    0.010000   0.001000

This would make sense if the program was subtracting reads that map to the reference, which I expect (hope!) to be the majority of my reads in a RADcap library. I'll very much appreciate any help.