These are chat archives for dereneaton/ipyrad

14th
Apr 2018
Ollie White
@Ollie_W_White_twitter
Apr 14 2018 15:16
Hi @isaacovercast yes you were right thank you! I set up two branches, both with the minimum cluster depth set to 5 (the lowest possible I think) and one with the R2 trimmed to 100 bp. Trimming the end of R2 seems to increase the number of clusters substantially. Below is the cluster results for min depth of 5:
cat des-md5_clust_0.85/s3_cluster_stats.txt
         clusters_total  hidepth_min clusters_hidepth avg_depth_total avg_depth_mj avg_depth_stat sd_depth_total sd_depth_mj sd_depth_stat filtered_bad_align
art_B82          144045            5             2958            1.34         6.07           6.07           0.94        1.43          1.43                  0
art_B83           90704            5                0            1.19          nan            nan           0.48         nan           nan                  0
bour_51v         222474            5                0            1.06          nan            nan           0.24         nan           nan                  0
bour_563         128376            5              306            1.19         5.00           5.00           0.56        0.00          0.00                  0
bour_GH          107010            5             1982            1.39         5.78           5.78           0.91        1.15          1.15                  0
dep_C26          123730            5             3836            1.43         6.30           6.30           1.12        1.68          1.68                  0
gil_131a          78835            5              150            1.18         5.44           5.44           0.52        0.90          0.90                  0
gil_B163         155496            5               87            1.21         5.00           5.00           0.57        0.00          0.00                  0
gon_B162         131624            5             1854            1.28         5.87           5.87           0.81        1.29          1.29                  0
gon_GHA           70576            5                0            1.17          nan            nan           0.45         nan           nan                  0
lem_98a          196171            5                0            1.07          nan            nan           0.25         nan           nan                  0
lem_98b          229100            5                0            1.11          nan            nan           0.34         nan           nan                  0
lem_B157         238980            5             1568            1.17         5.96           5.96           0.61        1.34          1.34                  0
mil_125a         200175            5                0            1.18          nan            nan           0.47         nan           nan                  0
mil_128b         199994            5             3652            1.30         6.39           6.39           0.92        1.75          1.75                  0
mil_94v           89538            5              331            1.21         5.37           5.37           0.58        0.67          0.67                  0
mil_GHA          127625            5             2653            1.38         5.95           5.95           0.95        1.31          1.31                  0
pre_B120         117908            5                0            1.12          nan            nan           0.32         nan           nan                  0
pre_GHA          106899            5                0            1.18          nan            nan           0.46         nan           nan                  0
tan_C6           103078            5                0            1.24          nan            nan           0.55         nan           nan                  0

And below are the results based on a min depth of 5 and R2 trimmed to 100.

cat des-md5-trimr2_clust_0.85/s3_cluster_stats.txt
         clusters_total  hidepth_min clusters_hidepth avg_depth_total avg_depth_mj avg_depth_stat sd_depth_total sd_depth_mj sd_depth_stat filtered_bad_align
art_B82          160430            5             6770            1.64         5.28           5.28           1.13        0.45          0.45                  0
art_B83          121147            5             8971            1.87         6.94           6.94           1.77        2.47          2.47                  0
bour_51v         288774            5            13456            1.54         6.87           6.87           1.42        2.18          2.18                  0
bour_563         162419            5             6419            1.59         6.07           6.07           1.22        1.41          1.41                  0
bour_GH          120889            5             1509            1.55         5.00           5.00           0.92        0.00          0.00                  0
dep_C26          157890            5            15477            2.00         6.92           6.92           1.94        2.25          2.25                  0
gil_131a          85763            5              600            1.34         5.54           5.54           0.73        1.00          1.00                  0
gil_B163         177393            5               68            1.38         5.00           5.00           0.77        0.00          0.00                  0
gon_B162         148824            5             1999            1.49         5.00           5.00           0.90        0.00          0.00                  0
gon_GHA           89451            5             2652            1.55         6.01           6.01           1.11        1.40          1.40                  0
lem_98a          252408            5             9344            1.50         6.32           6.32           1.21        1.66          1.66                  0
lem_98b          259015            5                0            1.31          nan            nan           0.66         nan           nan                  0
lem_B157         263949            5             7342            1.44         5.68           5.68           0.99        0.78          0.78                  0
mil_125a         226132            5              646            1.41         5.00           5.00           0.80        0.00          0.00                  0
mil_128b         225073            5            15809            1.76         6.93           6.93           1.70        2.18          2.18                  0
mil_94v           98893            5                0            1.31          nan            nan           0.59         nan           nan                  0
mil_GHA          143273            5             9607            1.82         6.25           6.25           1.51        1.60          1.60                  0
pre_B120         157367            5             9054            1.72         6.47           6.47           1.47        1.83          1.83                  0
pre_GHA          143254            5             9510            1.80         6.36           6.36           1.52        1.69          1.69                  0
tan_C6           137173            5            11819            1.99         6.61           6.61           1.76        2.04          2.04                  0

Would you recommend trimming R2 further? Some samples still coming out with zero. It would probably be easier to assemble just the forward or reverse reads but it seems a shame to get rid of good data. Cheers Ollie

Isaac Overcast
@isaacovercast
Apr 14 2018 16:53
@pbpearman Almost right. After step 1 there will be a _fastqs directory that contains all the demux'd sample fastqs. At this point you can create multiple new assemblies, lets say ipyrad -n 0.8, ipyrad -n 0.9 and so on. Then set the sample fastq path to point to the fastqs directory and run step 1 on each of the new assemblies, this will link the fastqs that already exist, but won't create any new files. Now when you run step 2 on each of the new assemblies they will each create a new _edits directory for themselves and then steps 3-7 will proceed independently.
Isaac Overcast
@isaacovercast
Apr 14 2018 16:59
@Ollie_W_White_twitter Good, that's what I expected. Might be better to play with trimming R1 a bit before further trimming R2, also mindepth can be set lower than 5 mindepth majority rule can go down to 3 i believe, but there's a tradeoff between quality of basecalls and number of clusters. Really, if you get good numbers of high depth clusters using only R1 I think it's probably better in the long run to use good quality data, rather than trying to triage the paired end data and maybe sacrificing some signal.