These are chat archives for dereneaton/ipyrad

4th
May 2017
vieves
@vieves
May 04 2017 02:12
Hi @dereneaton @isaacovercast , I've been having trouble with ipyrad trimming. I have a pairddrad set of 93 individuals. I can run an assembly if I leave all of the default parameters, but when I assess the genotype sets have very few loci with 1 SNP and most loci have 2 or 3 SNPs per locus. When I checked my files on FastQC, I can see that my last ~15-20 bases of each read are low quality, so I wanted to trim those off and see if the assembled genotypes make more sense, but when I trim, nothing passes step 2. All of my samples say "No reads passed filtering in Sample: 4704-plate0-000-AL2016-750-MiSeqNEBUltra_S00_L000001". And the ipyrad_log.txt shows this: "ERROR:ipyrad.assemble.cluster_within:sample [4704-plate0-000-AL2016-878-MiSeqNEBUltra_S00_L000001] failed in step [derep_concat_split]; error: IPyradError(('Error in merge pairs:\n %s\n%s', ['/u1/uaf/gmjohnson7/miniconda2/lib/python2.7/site-packages/bin/vsearch-linux-x86_64', '--fastq_mergepairs', '/import/c1/w/gmjohnson7/ipyrad_files/ipy170502_edits/4704-plate0-000-AL2016-878-MiSeqNEBUltra_S00_L000001.trimmedR1.fastq.tmp1', '--reverse', '/import/c1/w/gmjohnson7/ipyrad_files/ipy170502_edits/4704-plate0-000-AL2016-878-MiSeqNEBUltra_S00_L000001.trimmedR2.fastq.tmp2', '--fastqout', '/import/c1/w/gmjohnson7/ipyrad_files/ipy170503b_edits/4704-plate0-000-AL2016-878-MiSeqNEBUltra_S00_L000001merged.fastq', '--fastqout_notmerged_fwd', '/import/c1/w/gmjohnson7/ipyrad_files/ipy170503b_edits/tmp3Q4yGf_nonmergedR1.fastq', '--fastqout_notmerged_rev', '/import/c1/w/gmjohnson7/ipyrad_files/ipy170503b_edits/tmphsRlHf_nonmergedR2.fastq', '--fasta_width', '0', '--fastq_minmergelen', '35', '--fastq_maxns', '5', '--fastq_minovlen', '20', '--fastq_maxdiffs', '4', '--label_suffix', '_m1', '--fastq_qmax', '1000', '--threads', '2', '--fastq_allowmergestagger'], "vsearch v2.0.3_linux_x86_64, 126.1GB RAM, 24 cores\nhttps://github.com/torognes/vsearch\n\nMerging reads\n\nFatal error: Invalid line 3049654 in FASTQ file: Illegal character 'z'\n"))" Am I trimming too much? My files came from two separate lanes and I merged them before inputing them into ipyrad, could that step have caused a problem? Also, usually when I encounter this step 2 error, the job finishes within a few minutes and never makes it to further steps, but this particular run ran for 6+ hours, so I thought it was working. (I am using v.0.6.17)
Isaac Overcast
@isaacovercast
May 04 2017 02:16
How did you do the merge?
vieves
@vieves
May 04 2017 02:20
for s in 939 653 654 655 656 662 479 480 481 482 876 877 878 882 883 884 940 885 941 943 944 945 946 658 663 872 648 650 472 473 474 475 476 477 866 867 868 870 871 873 874 928 929 930 932 933 934 936 937 647 673 674 675 683 685 687 748 757 952 955 226 230 463 465 466 467 470 678 679 680 750 751 752 755 756 886 887 888 889 890 891 892 893 894 896 897 898 901 903 904 905 907 908 909 910 911 912 913 915 919
do
for j in R1 R2
do
zcat ./center/w/gmjohnson7/Fastq_Genohub7780528/${s}L005${j} ./center/w/gmjohnson7/Fastq_Genohub7780528/${s}L006${j} | gzip > ./center/w/gmjohnson7/Fastq_Genohub7780528_cat/4704-plate0-000-AL2016-${s}-MiSeqNEBUltra_S00L000${j}_001.fastq.gz
done
done
Isaac Overcast
@isaacovercast
May 04 2017 02:28
This doesn't look like a trimming issue. The error coming from derep_concat_split is from step 3, which is the step after filtering is applied. What does your params file look like? Also, that script doesn't look like it's cating 2 lanes of samples, to me at least. It might be easier to create params files for each lane, run step 1 on each lane, then merge, and run step 2. This is the cleanest way to do it.
vieves
@vieves
May 04 2017 02:32

------- ipyrad params file (v.0.6.17)-------------------------------------------
ipy170503b ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/center1/w/gmjohnson7/ipyrad_files ## [1] [project_dir]: Project dir (made in curdir if not present)

                           ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
                           ## [3] [barcodes_path]: Location of barcodes file

/center1/w/gmjohnson7/Fastq_Genohub7780528_cat/* ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)

                           ## [6] [reference_sequence]: Location of reference sequence file

pairddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
AATTC, CGG ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
40000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
0 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
5, 5 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
8, 8 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
4 ## [21] [min_samples_locus]: Min # samples per locus for output
20, 20 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
8, 8 ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
0, 20, 0,20 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs) 0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
p, s, v ## [27] [output_formats]: Output formats (see docs)

                           ## [28] [pop_assign_file]: Path to population assignment file
Isaac Overcast
@isaacovercast
May 04 2017 02:35
This might be a problem /center1/w/gmjohnson7/Fastq_Genohub7780528_cat/*. If there are other files in this directory. You should specify /center1/w/gmjohnson7/Fastq_Genohub7780528_cat/*.fastq.gz to make sure no other files are incorporated.
vieves
@vieves
May 04 2017 05:52
All the files in that directory are .fastq.gz. I will try your suggestion of running step 1 on each lane separately. I understand how to branch one assembly into separate assemblies, but how do I join the separate assemblies into one?
Edgardo M. Ortiz
@edgardomortiz
May 04 2017 15:07

Hello @dereneaton @isaacovercast , any idea about this error?


 -------------------------------------------------------------
  ipyrad [v.0.6.17]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: panicum-pe
  from saved path: /scratch/01982/jdpalaci/ddrad/e_PE-86_ipyrad/panicum-pe.json
  host compute node: [64 cores] on nid02007

  Step 6: Clustering at 0.86 similarity across 666 samples
  [####################] 100%  concat/shuffle input  | 0:06:49
  [####################] 100%  clustering across     | 3:24:50
  [####################] 100%  building clusters     | 0:06:26
  [####################] 100%  aligning clusters     | 1:09:11
  Encountered an error (see details in ./ipyrad_log.txt)
  Error summary is below -------------------------------
error in step 6 KeyError('SEV_05-F_HAL_292_66782;*2')

the file ipyrad.log says this:

 -------------------------------------------------------------
  ipyrad [v.0.6.17]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  Begin run: 2017-05-02 18:41
  Using args {'preview': False, 'force': True, 'threads': 2, 'results': False, 'quiet': False, 'merge': None, 'ipcluster': True, 'cores': 0, 'params': 'params-panicum-pe.txt', 'branch': None, 'steps': '67', 'debug': False, 'new': None, 'MPI': False}
  Platform info: ('Linux', 'nid02007', '3.0.101-0.47.86.1.11753.0.PTF-default', '#1 SMP Wed Oct 19 14:11:00 UTC 2016 (56c73f1)', 'x86_64')2017-05-02 23:29:09,322     pid=74004     [cluster_across.py]    ERROR     error in persistent_popen_align KeyError('SEV_05-F_HAL_292_66782;*2')
2017-05-02 23:29:09,570     pid=74004     [assembly.py]    ERROR     IPyradWarningExit: error in step 6 KeyError('SEV_05-F_HAL_292_66782;*2')

Thanks!

Isaac Overcast
@isaacovercast
May 04 2017 16:19
@edgardomortiz Mmm, yeah, i've seen that before. Can you email me the .json file /scratch/01982/jdpalaci/ddrad/e_PE-86_ipyrad/panicum-pe.json
@vieves Use the merge flag, like this: ipyrad -m merged-assembly params-1.txt params-2.txt where the two params files are current assemblies. This will create a file called params-merged-assembly.txt.