eaton-lab on rejson
write params descriptions logging cli exits on finish and 7 more (compare)
eaton-lab on rejson
new fast lowmem vcf writer w/ d… (compare)
eaton-lab on rejson
new shift table method for deno… save and load stats and outfiles fast denovo vcf build, but not … and 2 more (compare)
eaton-lab on rejson
step7 rerun cleaning cli update (compare)
eaton-lab on rejson
step7 database writing (compare)
isaacovercast on master
cosmetic (compare)
@isaacovercast And the : sudo ps -aef | grep ipyrad gives me.
"ec2-user 91051 1 0 2022 ? 00:00:02 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91072 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91075 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91077 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91081 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91082 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91085 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91088 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91091 91051 0 2022 ? 00:20:02 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller
ec2-user 91096 1 0 2022 ? 00:06:50 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91099 1 0 2022 ? 00:06:28 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91102 1 0 2022 ? 00:06:32 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91105 1 0 2022 ? 00:06:20 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91108 1 0 2022 ? 00:06:17 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91111 1 0 2022 ? 00:06:25 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91115 1 0 2022 ? 00:06:12 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91120 91096 0 2022 ? 00:16:09 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny
ec2-user 91124 91099 0 2022 ? 00:16:13 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny
ec2-user 91125 1 0 2022 ? 00:06:09 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91131 91102 0 2022 ? 00:16:10 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny
ec2-user 91132 1 0 2022 ? 00:05:54 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91135 1 0 2022 ? 00:06:03 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91141 91105 0 2022 ? 00:16:13 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny
ec2-user 91143 1 0 2022 ? 00:35:11 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91148 91108 0 2022 ? 00:16:09 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny
ec2-user 91152 1 0 2022 ? 00:05:35 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91153 91111 0 2022 ? 00:16:17 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny
ec2-user 91169 1 0 2022 ? 00:05:41 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91172 91115 0 2022 ? 00:16:13 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny
ec2-user 91189 1 0 2022 ? 00:05:37 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ec2-user 91196 1 0 2022 ? 00:05:35 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine
ls -l *_clust*
to look at the files that are changing in the step 3 directory. There should still be a couple files with recent modfication times, if it's actually still running. Check it out and let me know what you see.
ipyrad_log.txt
file to see if it reported anything about dying? If it's running on an hpc and it got killed for running over time, then yeah, you will need to give more time or else maybe split it into smaller batches and run them separately and then merge them after step 3. That sound slike a headache, but it would work.
@isaacovercast Hi Isaac, unfortunatelly it stopped again. I am trying step 3 again removing the huge samples...Is it possible that there is something about the parameters that I chose that is causing the problem? My params file is------- ipyrad params file (v.0.9.87)-------------------------------------------
CombGal4 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/ipyrad ## [1] [project_dir]: Project dir (made in curdir if not present)
Merged: galtest1, galtest2, galtest3, galtest4, galtest5, galtest6, galtest7, galtest8, galtest9, galtest10 ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
Merged: galtest1, galtest2, galtest3, galtest4, galtest5, galtest6, galtest7, galtest8, galtest9, galtest10 ## [3] [barcodes_path]: Location of barcodes file
Merged: galtest1, galtest2, galtest3, galtest4, galtest5, galtest6, galtest7, galtest8, galtest9, galtest10 ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference)
## [6] [reference_sequence]: Location of reference sequence file
pair3rad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
ATCGG, CGATCC ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
1 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus
4 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus
8 ## [23] [max_Indels_locus]: Max # of indels per locus
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
p, s, l ## [27] [output_formats]: Output formats (see docs)
## [28] [pop_assign_file]: Path to population assignment file
## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3
ipyrad -p params.txt -s 1234567 -c 57
, ipyrad starts without a problem. However, if I run ipyrad -p params.txt -s 1234567 -c 58
or any higher value for c
, it stalls (even if I wait for hours). In the instance with -c 58
, looking at top
I see that it successfully started 58 instances of python3.10, but it doesn't proceed from there. Is it encountering a memory issue? 57 seems like such a random number to cause problems....
ipcluster
instance with -n 58
and including -vv
to print debug information, that might help you figure out why ipcluster is getting stuck. Alternatively (and what I would recommend) is just to let this discontinuity remain a mystery and just run your ipyrad job with 57 cores and call it a day. Unless you really want to solve the mystery of the 58 cores, 57 will run in more or less the same time as 70.
python -V
inside the conda env. I just tried on python 3.10.8 and 3.9.12 and the ipyrad/scikit-learn install worked fine for me on both of these versions.
Hi ipyrad Team,
I have a question regarding the handling of reads in reference assisted assemblies.
We have a paired-end GBS dataset and did both denovo and reference assemblies with different parameters.
The draft reference covered almost the complete haploid genome size (~380 Mbp in draft reference, ~400 Mbp haploid genome size). We observed that we retrieve approximately ten times more loci from the reference analysis. This is not a real surprise, since up to 90% of eucaryotic genomes seem to be repetetive regions. However, we askes ourselves how ipyrad actually handles reads in repeated regions.
Does ipyrad prioritize achieving full coverage of the reference before clustering identical (or almost identical, depending on the clustering threshhold) reads within each sample?
Best,
Christoph