Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 23 20:34
    isaacovercast closed #481
  • Jan 23 20:34
    isaacovercast commented #481
  • Dec 28 2022 18:03

    eaton-lab on rejson

    write params descriptions logging cli exits on finish and 7 more (compare)

  • Dec 20 2022 04:36

    eaton-lab on rejson

    new fast lowmem vcf writer w/ d… (compare)

  • Dec 16 2022 18:12
    alexkrohn commented #468
  • Dec 16 2022 16:54
    isaacovercast commented #468
  • Dec 16 2022 16:00
    alexkrohn commented #468
  • Dec 15 2022 16:50

    eaton-lab on rejson

    new shift table method for deno… save and load stats and outfiles fast denovo vcf build, but not … and 2 more (compare)

  • Dec 10 2022 15:17
    isaacovercast reopened #497
  • Dec 10 2022 15:16
    isaacovercast closed #497
  • Dec 10 2022 15:16
    isaacovercast commented #497
  • Dec 09 2022 23:01
    jiangqiuqiuu commented #497
  • Dec 09 2022 22:05
    isaacovercast commented #497
  • Dec 09 2022 22:05
    isaacovercast commented #497
  • Dec 09 2022 20:51
    jiangqiuqiuu commented #497
  • Dec 08 2022 23:16

    eaton-lab on rejson

    step7 rerun cleaning cli update (compare)

  • Dec 08 2022 23:13

    eaton-lab on rejson

    step7 database writing (compare)

  • Dec 08 2022 18:39
    isaacovercast commented #497
  • Dec 08 2022 18:30
    jiangqiuqiuu commented #497
  • Dec 07 2022 15:09

    isaacovercast on master

    cosmetic (compare)

alexkrohn
@alexkrohn
Ach. The last line should be ipyrad -p params-new-and-old.txt -s 34567 since they've both already done steps 1 and 2.
Isaac Overcast
@isaacovercast
@alexkrohn The -m parameter takes the name of the new assembly, so you could say -m new-and-old and it will create a file called params-new-and-old.txt. Other than that it looks good.
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Hi Isaac and Happy start of 2023. Unfortunately, I think Ipyrad is not having a good start to the New Year while running on the cloud server. It looks like it could not finish step 3 despite having time. Could I bother you to check if it is still running? My interpretation is that it has stopped.
Top gives me: top - 14:26:17 up 19 days, 18:11, 1 user, load average: 0.28, 0.32, 0.18
Tasks: 378 total, 1 running, 267 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65826240 total, 2134084 free, 26130540 used, 37561616 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 38970964 avail Mem

@isaacovercast And the : sudo ps -aef | grep ipyrad gives me.

"ec2-user 91051 1 0 2022 ? 00:00:02 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91072 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91075 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91077 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91081 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91082 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91085 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91088 91051 0 2022 ? 00:00:00 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91091 91051 0 2022 ? 00:20:02 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.controller

ec2-user 91096 1 0 2022 ? 00:06:50 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91099 1 0 2022 ? 00:06:28 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91102 1 0 2022 ? 00:06:32 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91105 1 0 2022 ? 00:06:20 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91108 1 0 2022 ? 00:06:17 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91111 1 0 2022 ? 00:06:25 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91115 1 0 2022 ? 00:06:12 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91120 91096 0 2022 ? 00:16:09 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny

ec2-user 91124 91099 0 2022 ? 00:16:13 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny

ec2-user 91125 1 0 2022 ? 00:06:09 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91131 91102 0 2022 ? 00:16:10 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny

ec2-user 91132 1 0 2022 ? 00:05:54 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91135 1 0 2022 ? 00:06:03 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91141 91105 0 2022 ? 00:16:13 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny

ec2-user 91143 1 0 2022 ? 00:35:11 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91148 91108 0 2022 ? 00:16:09 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny

ec2-user 91152 1 0 2022 ? 00:05:35 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91153 91111 0 2022 ? 00:16:17 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny

ec2-user 91169 1 0 2022 ? 00:05:41 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91172 91115 0 2022 ? 00:16:13 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine.nanny

ec2-user 91189 1 0 2022 ? 00:05:37 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

ec2-user 91196 1 0 2022 ? 00:05:35 /home/ec2-user/miniconda3/envs/ipyrad/bin/python3.10 -m ipyparallel.engine

@isaacovercast Am I correct in interpreting that is has stopped? Thanks so much, A
Isaac Overcast
@isaacovercast
@AndreaSequeir10_twitter Happy new year to you too! ps shows the ipcluster is running so ipyrad hasn't shut down. It's probably still just working on one really big sample, this is what it looks like if ipyrad 'stalls' in step 3, most everything is done, except for one big sample. You can look at the progress on the clustering if you do an ls -l *_clust* to look at the files that are changing in the step 3 directory. There should still be a couple files with recent modfication times, if it's actually still running. Check it out and let me know what you see.
shtbh
@shtbh
@isaacovercast sorry for the late reply. my question was: "in my resulting loci, i have several undetermined nucleotides like Y, R, S, ecc, which can be present in 10-15% of the individuals. Is it possible to filter them out? My parameters: "max_alleles_consens:2", "max_Ns_consens:0.05", "max_Hs_consens:0.05". I give you an example to explain why i ask this: if I have as the common variant T but some individuals have K (Guanine / Thymine) I can´t be sure that it is a true variant…or i´m I wrong?
Since i´m doing genotype-environment association analysis, I´m wondering if I should consider them or not. Thank you!
Isaac Overcast
@isaacovercast
@shtbh In which file are you looking exactly? In the .alleles file? In the alleles file a K doesn't indicate an undetermined site, it indicates a heterozygous site, it is a SNP. It indicates more than one allele at that site, which is just as true a variant as anything else. In GxE analysis these are useful sites, so i would not filter them out.
shtbh
@shtbh
@isaacovercast i´m looking at the .loci file. Ok i understand, thank you very much!
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Thanks Isaac. I tried the ls -l _clust and the last modifications are from dec 20. The latest mods in the clust_0.85 directory are from dec 23. Would it help to run some other command to see what happened? Thanks a bunch as always, A
Isaac Overcast
@isaacovercast
@AndreaSequeir10_twitter Are you sure it didn't finish? try ipyrad -p params-yourparams.txt -r. If it finished all the samples will show 'state' == 3. Perhaps it finished and the ipcluster just didn't clean itself up.
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast I tried, but unfortunately they are all still in step 2. I was thinking that perhaps the thing to try is to remove some of those huge samples...and run step 3 again: any other thing that I could try. Thanks a million, A
Isaac Overcast
@isaacovercast
@AndreaSequeir10_twitter Did you check the ipyrad_log.txt file to see if it reported anything about dying? If it's running on an hpc and it got killed for running over time, then yeah, you will need to give more time or else maybe split it into smaller batches and run them separately and then merge them after step 3. That sound slike a headache, but it would work.
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Issac, I am running on a cloud server, a single one with 16 nodes. will check on restrictions on time etc. Where would I find the log file? I can't see it doing ls: (base) or looking at the web-based index . Thanks as aways, A[ec2-user@ip-172-31-5-213 ipyrad]$ ls
CombGal2_edits galtest10.json galtest4_fastqs galtest7.json params-CombGal2.txt params-galtest5v2.txt
CombGal2.json galtest1_fastqs galtest4.json galtest8_fastqs params-CombGal3.txt params-galtest6v2.txt
CombGal3_clust_0.85 galtest1.json galtest5_fastqs galtest8.json params-galtest10v2.txt params-galtest7v2.txt
CombGal3.json galtest2_fastqs galtest5.json galtest9_fastqs params-galtest1v2.txt params-galtest8v2.txt
CombGal3-tmpalign galtest2.json galtest6_fastqs galtest9.json params-galtest2v2.txt params-galtest9v2.txt
Data galtest3_fastqs galtest6.json Incoming params-galtest3v2.txt params-template.txt
galtest10_fastqs galtest3.json galtest7_fastqs Miniconda3-latest-Linux-x86_64.sh params-galtest4v2.txt run-ipyrad.sh
Isaac Overcast
@isaacovercast
Ooooooh, wait, sorry I forgot we don't have a log file anymore, i was thinking of one of my other programs. :(
Darn. Check on your time limits and let me know what you find. Good luck!
dnightin
@dnightin
Hi Isaac. Andrea's instance is dedicated to her in AWS, there are no time contsraints.
dnightin
@dnightin
We increased the RAM to 128G from 64G and are trying step 3 again.
Isaac Overcast
@isaacovercast
@dnightin Ok great. Thanks for letting me know. Let me know how it goes.
Andrea Sequeira
@AndreaSequeir10_twitter

@isaacovercast Hi Isaac, unfortunatelly it stopped again. I am trying step 3 again removing the huge samples...Is it possible that there is something about the parameters that I chose that is causing the problem? My params file is------- ipyrad params file (v.0.9.87)-------------------------------------------
CombGal4 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/ipyrad ## [1] [project_dir]: Project dir (made in curdir if not present)
Merged: galtest1, galtest2, galtest3, galtest4, galtest5, galtest6, galtest7, galtest8, galtest9, galtest10 ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
Merged: galtest1, galtest2, galtest3, galtest4, galtest5, galtest6, galtest7, galtest8, galtest9, galtest10 ## [3] [barcodes_path]: Location of barcodes file
Merged: galtest1, galtest2, galtest3, galtest4, galtest5, galtest6, galtest7, galtest8, galtest9, galtest10 ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference)

                           ## [6] [reference_sequence]: Location of reference sequence file

pair3rad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
ATCGG, CGATCC ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
1 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus
0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus
4 ## [21] [min_samples_locus]: Min # samples per locus for output
0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus
8 ## [23] [max_Indels_locus]: Max # of indels per locus
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus
0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs) 0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
p, s, l ## [27] [output_formats]: Output formats (see docs)

                           ## [28] [pop_assign_file]: Path to population assignment file
                           ## [29] [reference_as_filter]: Reads mapped to this reference are removed in step 3
Isaac Overcast
@isaacovercast
@AndreaSequeir10_twitter The parameters look fine to me. The 0.85 clustering threshold is 'phylogenetic' scale, so if these are all from the same genus or species it might be better to make this 0.9 or 0.91. This is just a tip, and it will not make step 3 crash, in normal circumstances. It's difficult to say what's happening without knowing more about why ipyrad is stopping. Do you have access to the console output of ipyrad? What does it say after it stop? Can you show me the full output from the console after ipyrad stops running?
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Thanks for the tips Isaac. Because I have to run in the background due to the connectivity instabily with the instance sometimes I don't have the console output when it stops. I am running it again and so far continue to have console output. If it stops I will definitely send you what I have. thanks a million as always, A
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Celebrating the completion of Step 3! (I removed some huge samples that I can live without). Only four more steps to go. Have a good weekend!
Isaac Overcast
@isaacovercast
@AndreaSequeir10_twitter +1 +1 +1 +1 Awesome. Thanks for letting me know! Please keep me posted as the assembly continues, now I'm invested in seeing the results :)
dnightin
@dnightin
Hi Isaac. We were able to complete all seven steps using another terminal that didn't drop the connection mid-process.
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Hi Isaac, super happy to have finished the first assembly but want to try with the 0.9 clustering as you suggested. Can I run steps 3-7 or do I have to start from step 1? What do I need to leave constant in the params file so that it will not get confused? Thanks a million, A
Isaac Overcast
@isaacovercast
@AndreaSequeir10_twitter What I would do is copy the output files to a new directory (e.g "outfiles_0.85"), then change the clust_threshold parameter to 0.9, and then run it again from step 3, including the '-f' flag to force overwriting previous results.
alexkrohn
@alexkrohn
Hey @isaacovercast. I have a question about setting the number of cores. I'm running ipyrad on a Linux machine with 70 cores and 500GB RAM. Whenever I initiate ipyrad with > 57 cores, it hangs after showing the version and "Interactive assembly and analysis of RAD-seq data". For example, if I run ipyrad -p params.txt -s 1234567 -c 57, ipyrad starts without a problem. However, if I run ipyrad -p params.txt -s 1234567 -c 58 or any higher value for c, it stalls (even if I wait for hours). In the instance with -c 58, looking at top I see that it successfully started 58 instances of python3.10, but it doesn't proceed from there. Is it encountering a memory issue? 57 seems like such a random number to cause problems....
Isaac Overcast
@isaacovercast
@alexkrohn 57 does seem like quite a random number. My guess is that it could be something to do with resource allocation on your machine. You could investigate this further by launching an ipcluster instance with -n 58 and including -vv to print debug information, that might help you figure out why ipcluster is getting stuck. Alternatively (and what I would recommend) is just to let this discontinuity remain a mystery and just run your ipyrad job with 57 cores and call it a day. Unless you really want to solve the mystery of the 58 cores, 57 will run in more or less the same time as 70.
alexkrohn
@alexkrohn
@isaacovercast I like your approach. I agree -- most of ipyrad runs with fewer than c cores, so the time increase won't be that great. I'll see what I can find with launching a standalone ipcluster instance.
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Hi Isaac, I was wondering what version of Python is needed to run the API packages? I am having dependencies trouble installing scikit-learn and toyplot....Would the cookbooks from way back when still work if I want to go that way..? Thanks a million (I have two assemblies..yay!)
Isaac Overcast
@isaacovercast
@AndreaSequeir10_twitter Hi Andrea, congrats on getting 2 assemblies done! The API mode should work fine with any version of python that runs ipyrad on the command line. What version are you using? Python 3.10 or 3.9 should work fine. If all else fails try installing evertyhing again in a new and clean conda environment, sometimes conda dependencies get messed up.
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Hi Isaac! unfortunatelly I am running into installing snags for the API. I deleted miniconda and reinstalled both miniconda and Ipyrad , it could solve the environment for that and later I also successfully re-installed notebook and mp4py but it cannot install install scikit-learn. It finds unsatisfiable errors. COuld you please direct me to the best place to follow the install directions in case I am missing something?, I am using the readthedocs and then in the API sections... I was starting with the PCA software reqs...Any ideas are welcome, thanks as always, A
Isaac Overcast
@isaacovercast
@AndreaSequeir10_twitter You shouldn't need to install mpi4py for the API mode to work, so i would skip that. What version of python are you installing for? python -V inside the conda env. I just tried on python 3.10.8 and 3.9.12 and the ipyrad/scikit-learn install worked fine for me on both of these versions.
ckiel3
@ckiel3
hi Isaac. Does the latest version of Ipyrad allow for a denovo+reference assembly? I see far back in the chat (2020) that it works for the 0.7 version. I have version v.0.9.81 and it doesn't seem to like that reference type. if it's not supported is there a way to get the older version of Ipyrad where it works? I tried conda install -c ipyrad -c conda-forge ipyrad - as previously suggested in 2020 but I get: PackagesNotFoundError: The following packages are not available from current channels:
  • ipyrad
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Hi Isaac: Thanks for checking. I am using Python 3.10.8...scikit still giving me trouble solving....I created a new environment and I will try again...
Phismil
@Phismil
Dear @isaacovercast, I wish you a happy 2023. I recently had some issues with window_extracter.py on some hdf5 files. The error refers to "window_extracter.py", line 148, in init self._parse_scaffolds(), and complains about self.pnames = self.pnames[self.sidxs]
TypeError: list indices must be integers or slices, not list.
Thank you in advance for any advice
Isaac Overcast
@isaacovercast
@ckiel3 The 0.9 version of ipyrad doesn't support denovo+reference, unfortunately. Getting the 0.7 version installed would be quite difficult I would imagine. You could pull the tgz file from the github repository and then try to install the dependencies by hand, but this would be tricky. denovo+reference was a good idea, but we found in practice that it wasn't much better than denovo and/or reference alone in most cases.
@AndreaSequeir10_twitter Any luck with the install of scikit?
@Phismil Can you please show me the version of ipyrad you are running? Also show me all the code in the jupyter notebook that you're running that generates this error?
Phismil
@Phismil
Thanks @isaacovercast , Here are the codes.I run them on a HPC. (ipyrad) [login2 ~ 18:05:39 ]$ipyrad -v
ipyrad 0.9.87
Python
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
import ipyrad.analysis as ipa
import toytree
seqfile = "Test.seqs.hdf5"
ext = ipa.window_extracter(seqfile)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/lustre/users/Miniconda3/envs/ipyrad/lib/python3.10/site-packages/ipyrad/analysis/window_extracter.py", line 148, in init
self._parse_scaffolds()
File "/mnt/lustre/users/Miniconda3/envs/ipyrad/lib/python3.10/site-packages/ipyrad/analysis/window_extracter.py", line 342, in _parse_scaffolds
self.pnames = self.pnames[self.sidxs]
TypeError: list indices must be integers or slices, not list
Andrea Sequeira
@AndreaSequeir10_twitter
@isaacovercast Hi Isaac, thanks so much for remembering!!! I had to dig around a bit (and thanks to students that graduate and become so much better than me at this and help me out..) I found that the conflict was between the scikit version and the python version I had installed, so by asking for the scikit 1.0.2 which I think is compatible with python 3.10 it worked! I am now fighting to install structure...may come back to ask about that...:)
Isaac Overcast
@isaacovercast
@Phismil Is the hdf5 file from the same version of ipyrad? If it's an hdf5 from an older version (which I have seen before) this can cause problems.
Phismil
@Phismil
I moved my previous replies to the thread such that others with the same issue can follow your suggestions. Yes, the hdf5 was generated by the latest Ipyrad version. Thank you in advance @isaacovercast
Isaac Overcast
@isaacovercast
@Phismil What version of numpy are you using?
Isaac Overcast
@isaacovercast
conda list | grep numpy
It works for me with np 1.17 and 1.21. Maybe you have an older version of numpy? Can you wetransfer me the Test.seqs.hdf5 file?
Phismil
@Phismil
Thank you @isaacovercast for your time. It was shared with your GitHub Gmail address. Cheers
cnehrke
@cnehrke

Hi ipyrad Team,

I have a question regarding the handling of reads in reference assisted assemblies.
We have a paired-end GBS dataset and did both denovo and reference assemblies with different parameters.
The draft reference covered almost the complete haploid genome size (~380 Mbp in draft reference, ~400 Mbp haploid genome size). We observed that we retrieve approximately ten times more loci from the reference analysis. This is not a real surprise, since up to 90% of eucaryotic genomes seem to be repetetive regions. However, we askes ourselves how ipyrad actually handles reads in repeated regions.
Does ipyrad prioritize achieving full coverage of the reference before clustering identical (or almost identical, depending on the clustering threshhold) reads within each sample?
Best,
Christoph

Isaac Overcast
@isaacovercast
@cnehrke The 'denovo' assembly is only looking at % sequence similarity for clustering reads. The 'reference' assembly uses bwa internally, so we inherit the pros and cons of bwa with respect to repeat regions. For 'denovo' it is almost certainly clustering paralog repetitive regions (which can be identified and removed). For 'reference' it's possible that the 10x more loci reflect misalignment to some degree so I would inspect the results a little before proceeding with downstream. Hope that helps.