These are chat archives for dereneaton/ipyrad

16th
Nov 2018
Maribet Gamboa
@maribetg
Nov 16 2018 03:30
Hi all,
I have a problem running the step 1. ipyrad was installed successfully, however I encountered an unexpected error in less that 10% of sorting reads. ipyradlog.text said insecure string pickle. I think it is something regarding with python connection, but I am not sure. Does anybody knows what should I do? thank you
Isaac Overcast
@isaacovercast
Nov 16 2018 05:24
@maribetg Can you post the exact error message? Any exact error text from the terminal and from the ipyrad_log.txt file will be useful.
Also, more information about your RAD data type and any params you changed from the defaults.
Maribet Gamboa
@maribetg
Nov 16 2018 05:29

@isaacovercast This is the error message -------------------------------------------------------------
ipyrad [v.0.7.28]

Interactive assembly and analysis of RAD-seq data

Begin run: 2018-11-14 12:04
Using args {'preview': False, 'force': False, 'threads': 2, 'results': True, 'quiet': False, 'merge': None, 'ipcluster': None, 'cores': 0, 'params': 'params-P001.txt', 'branch': None, 'steps': '1', 'debug': False, 'new': None, 'download': None, 'MPI': False}
Platform info: ('Linux', 'Ubuntu', '4.4.0-134-generic', '#160-Ubuntu SMP Wed Aug 15 14:58:00 UTC 2018', 'x86_64')2018-11-14 12:04:54,210 pid=17448 [assembly.py] ERROR insecure string pickle

I have a ddRAD data and here my parms file. Thank you
------- ipyrad params file (v.0.7.28)-------------------------------------------
P001 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
./ ## [1] [project_dir]: Project dir (made in curdir if not present)
./raw/P001/*fastq.gz ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
./raw/BarcodesP001.txt ## [3] [barcodes_path]: Location of barcodes file

                           ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files

denovo ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)

                           ## [6] [reference_sequence]: Location of reference sequence file

ddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
TGCAG, CCGG ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
5, 5 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
8, 8 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
4 ## [21] [min_samples_locus]: Min # samples per locus for output
20, 20 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
8, 8 ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs) 0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
p, s, v, n, k, g, u ## [27] [output_formats]: Output formats (see docs)

                           ## [28] [pop_assign_file]: Path to population assignment file
Isaac Overcast
@isaacovercast
Nov 16 2018 06:51
@maribetg How big are the raw files? Can you run step 1 with the -d flag to write more info to the debug file.
Maribet Gamboa
@maribetg
Nov 16 2018 07:16
@isaacovercast yeah I found the same website response to the pickle issue. The raw files are 26.6 GB of 112 indiv. I ran again using -d flag, and now i got a different error message. Here the last 8 lines of log file.
2018-11-16 16:10:11,984 pid=4258 [assembly.py] ERROR invalid load key, '4'.
2018-11-16 16:10:12,010 pid=4447 [demultiplex.py] DEBUG Doing chunk /media/ecology21/6f42432b-18ab-4dcc-ae3d-386002e6c781/Mari/phylogenomics/raw/P001/Index_R1.fastq.gz
2018-11-16 16:10:13,010 pid=4258 [assembly.py] INFO interrupted engine 0 w/ SIGINT to 4303
2018-11-16 16:10:13,033 pid=4258 [assembly.py] INFO interrupted engine 4 w/ SIGINT to 4332
2018-11-16 16:10:13,057 pid=4258 [assembly.py] INFO interrupted engine 8 w/ SIGINT to 4401
2018-11-16 16:10:13,075 pid=4258 [assembly.py] INFO interrupted engine 12 w/ SIGINT to 4447
2018-11-16 16:10:14,090 pid=4258 [assembly.py] INFO shutting down engines
Maribet Gamboa
@maribetg
Nov 16 2018 07:21
I notice a file without the barcoding (Index_R1.fastq.gz). I remove it just in case got in a problem with the barcode file and ran it again, however I faced the same error of interrupted engine
gscabanne
@gscabanne
Nov 16 2018 11:36
Issac, what follows is the params- files I was using in the analysis that results in step 3 with very low number of clusters.

------- ipyrad params file (v.0.7.28)-------------------------------------------
Af ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/mnt/c/Demult2016/2016demultII/index1 ## [1] [project_dir]: Project dir (made in curdir if not present)

                           ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files

Merged: index1II, index2II, index3II, index6II, index12II ## [3] [barcodes_path]: Location of barcodes file

                           ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files

denovo ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)

                           ## [6] [reference_sequence]: Location of reference sequence file

ddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
TGCAG, ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
5, 5 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
8, 8 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
4 ## [21] [min_samples_locus]: Min # samples per locus for output
20, 20 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
8, 8 ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
5, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs) 0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
p, s, v ## [27] [output_formats]: Output formats (see docs)

                           ## [28] [pop_assign_file]: Path to population assignment file
gscabanne
@gscabanne
Nov 16 2018 18:33
Further comments on the previous problem....I repeated the analysis and notice that step three gets frozen in aligning. Specifically...% of advance and time stop advancing. I wonder if this is a problem of available memory? I am running this in the ubuntu subsystem of win10, with 8 GB available of RAM.