Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Dec 17 2020 11:46

    isaacovercast on 0.9.63

    (compare)

  • Dec 17 2020 11:46

    isaacovercast on master

    "Updating ipyrad/__init__.py t… (compare)

  • Dec 17 2020 11:45

    isaacovercast on master

    Fix off-by-one error in nexus o… (compare)

  • Dec 15 2020 22:52
    dereneaton commented #428
  • Dec 15 2020 21:18
    isaacovercast closed #425
  • Dec 15 2020 21:18
    isaacovercast commented #425
  • Dec 15 2020 21:13
    isaacovercast opened #428
  • Dec 15 2020 16:54

    dereneaton on master

    update struct testdocs Merge branch 'master' of https:… (compare)

  • Dec 14 2020 11:35
    isaacovercast closed #426
  • Dec 14 2020 11:35
    isaacovercast commented #426
  • Dec 14 2020 11:32
    isaacovercast commented #427
  • Dec 14 2020 10:33
    ARW-UBT opened #427
  • Dec 14 2020 00:05
    rspfau opened #426
  • Dec 07 2020 16:18

    isaacovercast on master

    cosmetic (compare)

  • Dec 07 2020 16:06

    isaacovercast on master

    Add Tajima's D denominator equa… (compare)

  • Dec 06 2020 23:47
    eaton-lab commented #421
  • Dec 06 2020 18:15
    eaton-lab commented #425
  • Dec 05 2020 18:17
    ksil91 commented #425
  • Dec 05 2020 14:15

    isaacovercast on master

    cosmetic (compare)

  • Dec 03 2020 06:02

    dereneaton on master

    use quiet in lex (compare)

Isaac Overcast
@isaacovercast
@mergi_gitlab What are you trying to do to generate they phylogenetic tree? How is it not working?
Isaac Overcast
@isaacovercast
@biohiroto Woops, I meant these last questions for you.
What are you trying and what exactly is it doing? Any error messages are useful.
Mergi Daba
@mergi_gitlab
@isaacovercast I was generating phylogenetic tree for some family with equivocal relationship. On the preliminary analysis i used ipyrad 0.7.30 denovo assembly and 0.80 clustering threshold. the phylogenetic tree has a bootstrap value > 80 but with ipyrad 0.9.62 it is less than what i got earlier. that is why i needed 0.7.30 version
Isaac Overcast
@isaacovercast
@mergi_gitlab In general, unless it is shown otherwise, the most recent version of ipyrad will generate the most accurate assembly. If the family has equivocal relationships, then perhaps the (slightly) lower bootstrap support with the new version is an accurate reflection of the uncertainty of the relationships.
biohiroto
@biohiroto
@isaacovercast Sorry for my obscure question. I'm trying to get concordance factor values for each node on my phylogenetic tree using a phylogenetic analysis software, IQ-TREE(http://www.iqtree.org). IQ-TREE first generates various evolutionary models for each locus, in other words, each partition in NEXUS format, then it constructs a whole tree with the concordance factor values. So I want to separate my sequence data to partitions. What I see in SETS BLOCK in my NEXUS file is like this,         BEGIN SETS;
charset 0 = 0-245;
charset 1 = 245-449;
charset 2 = 449-680;
charset 3 = 680-880;
charset 4 = 880-1059; In this case, charset 0 and charset 1 share the 245th site for example, but I don't want any site to be included in different partitions at the same time. Is this the default way ipyrad separate sequences to partitions? If so, is there any ways to separate them to partitions as I want to?
Isaac Overcast
@isaacovercast
@biohiroto Looks like an off-by-one error in our nexus output. I fixed this and pushed the change to 0.9.63, which should be up on the bioconda channel within a day or so.
biohiroto
@biohiroto
@isaacovercast Thank you for your prompt changing the code. It will help me a lot for sure ! I'll gladly wait for it.
Kyle O'Connell
@kyleaoconnell22
Hi @isaacovercast I have a step 3 question similar to ones above. I am running 150bp PE GBS samples (n=50) on a cluster, with 2-12 million reads per sample. Step 3 is taking a long time. I cranked things up to 48 cores with 20GB RAM. Would it speed things up further to use less ram and more threads? Or less threads and more RAM? Would it make sense to just use the R1 in this case to save time? Thanks!
Isaac Overcast
@isaacovercast
@kyleaoconnell22 Hey Kyle. We recommend 4GB of RAM per core, so when you say 20GB RAM is this 20 TOTAL? For longer reads you'll need more RAM per core. If you don't have >4GB RAM per core then clustering will get SUPER slow. Max out the RAM, as much as you can get, and reduce the number of cores so there is at least 4GB per core.
Alternatively, if you just run R1 you can get results quickly and then start developing your downstream analysis, then you can go back and run R1+R2 while you monkey with the downstream stuff and it won't matter how slow it is.
Kyle O'Connell
@kyleaoconnell22
Thanks @isaacovercast , I have 20GB of RAM per core. It seemed to crash with less than that in my test runs. I think I will try for just R1 so I can start playing with the data. Thanks again for the help!
davidomartinez
@davidomartinez
Hi Isaac, Deren, and any other developers of ipyrad: I have a comment/suggestion. As far as I understand from the ipyrad documentation, the output files reflecting single SNPs per locus (i.e. ustr, usnps) choose that SNP by minimizing the amount of missing data, and then randomly among all available SNP from a particular locus if several SNPs with the same amount of missing data are available. In several datasets that I have analyzed, it has meant that more than half of my loci end up being represented by SNP that are autapomorphic for a single taxon or that just show intraindividual polymorphism. Those SNPs do not bring any useful information for downstream phylogenetic analyses and perhaps are also uninformative for genetic clustering analyses (although I am not sure of the latter statement). Therefore, by choosing SNPs randomly in each locus, many times it would indirectly discard the information contained in that locus, a high price. If that is the case, I would suggest in future developments of ipyrad choosing those SNPs in a different way: e.g. enforce choosing PiS if any are available. Or am I missing something important here? Thanks a lot for developing ipyrad.
Isaac Overcast
@isaacovercast
@davidomartinez Right now the snps for the unlinked output files are randomly sampled, there is no further missing data minimization beyond the min_samples_locus filter. Since the unlinked outputs are generated randomly each time you run step 7 it's possible to run multiple times and get multiple output files and average results of downstream analysis. This is actually the reason we implemented the ipyrad tools in the way we did, to make replicating sampling of unlinked sites and running downstream multiple times super easy.
Isaac Overcast
@isaacovercast
@davidomartinez In terms of enforcing choosing PiS during the sampling process, while this may increase the information content of the data, the non-random sampling of snps per locus would artificially inflate diversity and introduce bias.
@davidomartinez Glad you like ipyrad! Thanks for the feedback.
davidomartinez
@davidomartinez
Thanks for your response, Isaac!
milkandrelish
@milkandrelish

Hi @isaacovercast I am new to programming, ipyrad and to computing on the cluster. I have run ipyrad successfully on my local machine, and also on my institution's cluster, but I am running into issues running ipyrad on multiple nodes. On the help documentation, it is suggested to use "module load MPI". My institution has a ton of MPI modules, and I am not sure which one is appropriate. Are you able to help me out? Does the --MPI tag still work with these? Anything else special I need to do? @^~^@ module avail openmpi

------------------------------ /etc/modulefiles -------------------------------
openmpi-1.4-psm-x86_64 openmpi-1.5.3-psm-x86_64
openmpi-1.4-x86_64 openmpi-1.5.3-x86_64

------------------------------ /act/modulefiles -------------------------------
openmpi-1.10.0-intel openmpi-1.8-intel
openmpi-1.6/gcc openmpi-1.8-intel-14.0.4
openmpi-1.6/gcc-4.7.2 openmpi-1.8-psm/gcc
openmpi-1.6/lua-5.3.0 openmpi-1.8-psm/gcc-4.7.2
openmpi-1.6-psm/gcc openmpi-2.0.0-intel-15.0.1
openmpi-1.6-psm/gcc-4.7.2 openmpi-2.0.0-intel-16.0.3
openmpi-1.8/gcc openmpi-2.1.0-intel-17.0.2
openmpi-1.8/gcc-4.7.2 openmpi-2.1.3-intel-17.0.6
[Tue Dec 29 09:58:21] xhuang@login01
~
@^~^@ module avail mvapich

------------------------------ /act/modulefiles -------------------------------
mvapich/gcc mvapich2-2.0-psm/gcc-4.7.2
mvapich/gcc-4.7.2 mvapich2-2.1-intel-14.0.4
mvapich2-2.0/gcc mvapich2-2.1-intel-15.0.1
mvapich2-2.0/gcc-4.7.2 mvapich2-2.2-psm
mvapich2-2.0-psm/gcc

Isaac Overcast
@isaacovercast
@milkandrelish Let me ask you a question: Do you need to use multiple nodes? Unless your assembly is truly massive (or the nodes of your cluster are relatively weak) then you shouldn't need multiple nodes, in which case the added complexity of figuring out MPI is unnecessary.
@milkandrelish I assume you were looking at this page in the docs: https://ipyrad.readthedocs.io/en/latest/HPC_script.html?highlight=hpc
@milkandrelish In terms of getting it working, your cluster admins would be better to ask about which MPI module to load, as you see there are many different ones and the behavior of each particular cluster is unique, so I won't be able to help with that.
milkandrelish
@milkandrelish
@isaacovercast Thank you, I do not need multiple nodes, but I am hoping to run many assemblies at once. I will work with a single node until I can figure this out with my cluster admins. I appreciate your quick response, and yes that is the help doc I am reading!
Isaac Overcast
@isaacovercast
@milkandrelish You don't need MPI to run many assemblies at once, just launch each assembly on a different node. Also, many assemblies of the same data or of different data? If you are running different assemblies of the same data you should be very careful or else there will be problems with ipyrad stepping on it's own temp files, etc.
Safiqul713
@safiqu

I could get the error, why I got it, what is source of the problem, Dear Issac could you please suggest me what I did wrong.

      IndexError: list index out of range                                                                     
    $ ipyrad -p params-parus_whole.txt -s 1

Traceback (most recent call last):
File "/anaconda3/bin/ipyrad", line 10, in <module>
sys.exit(main())
File "/anaconda3/lib/python3.7/site-packages/ipyrad/main.py", line 598, in main
CLI()
File "/anaconda3/lib/python3.7/site-packages/ipyrad/main.py", line 69, in init
self.get_assembly()
File "/anaconda3/lib/python3.7/site-packages/ipyrad/main.py", line 369, in get_assembly
data.set_params(key, param)
File "/anaconda3/lib/python3.7/site-packages/ipyrad/core/assembly.py", line 492, in set_params
setattr(self.params, param, newvalue)
File "/anaconda3/lib/python3.7/site-packages/ipyrad/core/params.py", line 730, in pop_assign_file
self._data._link_populations()
File "/anaconda3/lib/python3.7/site-packages/ipyrad/core/assembly.py", line 401, in _link_populations
for i in minlist})
File "/anaconda3/lib/python3.7/site-packages/ipyrad/core/assembly.py", line 401, in <dictcomp>
for i in minlist})
IndexError: list index out of range

Isaac Overcast
@isaacovercast
@biohiroto Did you ever get a chance to test IQ-Tree with the updated and fixed nexus file?
@safiqu Your pop_assign_file is malformed. You probably don't have the last line specifying number of samples per pop, see the online docs for proper formatting. Also, this file doesn't actually do anything until step 7.
milkandrelish
@milkandrelish

Hi @isaacovercast thank you for your help earlier. I got everything up and running smoothly. I have been trying to troubleshoot an issue for the last couple of days, and have had no luck finding answers elsewhere. I am hoping you might know what to do, or at least point me in the right direction.

I am running STRUCTURE using these instructions here: https://ipyrad.readthedocs.io/en/latest/API-analysis/cookbook-structure.html# and I am running Jupyter notebook on my institutions HPC cluster using these instructions: https://github.com/dereneaton/ipyrad/blob/master/docs/HPC_Tunnel.rst

I have Jupyter notebook up and running and I can get steps/boxes 1-8 of the STRUCTURE cookbook to seemingly run without errors, but when I try to run box 9 (creating an etable under the header Analyze results: Choosing K), I get the following error: "IndexError: list index out of range".

This does not happen when I run the same exact thing on my local machine. Any thoughts here?

Isaac Overcast
@isaacovercast
@milkandrelish So structure runs fine? Have you verified that it is creating output files? It kind of feels like structure isn't installed on the cluster, are you sure it's installed? Can you verify that structure is installed and working in your conda env on the cluster?
milkandrelish
@milkandrelish
@isaacovercast Hi Isaac, thank you for this suggestion! It was installed, but I didn't get the glibc-2.14 module loaded, which turned out to be the problem!
biohiroto
@biohiroto
@isaacovercast Thank you for having changed codes for off-by-one errors in nexus output. Although my output file was output properly and the error message disappeared, I got another error message below in iqtree. "WARNING: Estimated model parameters are at boundary that can cause numerical instability!". Once I have asked iqtree developers for the off-by-one errors and they said the number with which charset should start could also be a factor of some errors, which means it shouldn't start with 0. I'm not sure this error message is related to that, but I want you to help me with this.
quattrinia
@quattrinia
Have any of you come across this error when trying to run ipyrad in jupyter notebook. File "/Users/quattrinia/miniconda3/lib/python3.7/site-packages/ipyrad/core/Parallel.py", line 314, in wrap_run
self.tool._run(ipyclient=self.ipyclient, **self.rkwargs)
File "/Users/quattrinia/miniconda3/lib/python3.7/site-packages/ipyrad/core/assembly.py", line 691, in _run
stepdictstep.run()
File "/Users/quattrinia/miniconda3/lib/python3.7/site-packages/ipyrad/assemble/write_outputs.py", line 47, in init
self.samples = self.get_subsamples()
File "/Users/quattrinia/miniconda3/lib/python3.7/site-packages/ipyrad/assemble/write_outputs.py", line 108, in get_subsamples
dbsamples = inloci.readline()[1:].strip().split(",@")
File "/Users/quattrinia/miniconda3/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
VictorFitz
@VictorFitz
Hello, I'm having a similar issue with step 3 as Kyle was having earlier in the chat. I'm running a pairddrad assembly, with 60 samples ranging from about 1-10 million reads. Step 3 is taking a very long time- approximately 48 hr to get to 1% progress. I'm running ipyrad on 1 node with 24 cores, which seems to be sufficient in nearly all other cases. I've previously trimmed my samples using trimmomatic and have my phred score set to 33, is there anything I can improve to speed up the process or is this something that simply needs to be waited out? Thanks
Isaac Overcast
@isaacovercast
@quattrinia What version of ipyrad are you on? Have you run the whole assembly with this same version?
@VictorFitz How much RAM per core?
@VictorFitz I assume this is PE data. You can use the '-t' flag to specify number of threads for clustering (default is 2). In practice this will help a little, but not like 10x speedup.
You can try allocating more RAM per core. We recommend 4GB, but if your dataset is very large then more is necessary. It can even be the case that reducing the number of cores (to increase ram per core) can actually speed up the process (to prevent swapping to disk).
Isaac Overcast
@isaacovercast
@quattrinia If you are not on the most recent version please update and try again.
Isaac Overcast
@isaacovercast
@biohiroto It sounds like iqtree is running fine. This error "WARNING: Estimated model parameters are at boundary that can cause numerical instability!" <- seems like it's telling you the real problem, parameter boundaries are too restrictive. If you're not sure the error message is related to the charset start value I'm not sure what I can do to help you.
Qiuyu Jiang
@QiuyuJiang_twitter

@isaacovercast Hi, Issac!
I finished running Rad-seq assembly for pair-end data and successfully gained output. But I only got .loci .phy .nex and so on.

However for my downstream analysis(searching for SSR and design primer out of it) I will need .fasta.

Do you have any idea how can I convert any of them to .fasta for next step?

Sorry I don't know whether it's ok to ask you about this here. I would really appreciate your help!

Isaac Overcast
@isaacovercast
@QiuyuJiang_twitter It's totally ok to ask about this here! The .phy output file is probably closest to a concatenated fasta format, but you'll have to do a little work to reformat it. Shouldn't be too hard. Good luck!
quattrinia
@quattrinia
@isaacovercast my issue ended up being trying to load an assembly file from a previous ipyrad version into the newer version of ipyrad. Unfortunately, I just had to start from scratch (but it didnt take too long!)
andreeab917
@andreeab917
Hi there I have been running step 3 of ipyrad for 3 days on 2 nodes and it seems that the cpu/memory usage efficiency isn't great. I attached the usage output. Is this normal when running on 2 nodes? I have ran it on one node a few months ago and the run time would take 10+ days.

Step 3: Clustering/Mapping reads within samples
skipping samples not yet in state==2:
['CAYU_15A']
[####################] 100% 0:03:13 | join merged pairs
[####################] 100% 0:02:46 | join unmerged pairs
[####################] 100% 0:02:47 | dereplicating
[################## ] 92% 2 days, 22:11:29 | clustering/mapping

Please check with other users in your lab or the author to see if this is expected.

[andreeab@gra490 ~]$ top
top - 09:25:09 up 8 days, 18:22, 1 user, load average: 3.06, 3.05, 3.07
Tasks: 579 total, 1 running, 578 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.1 us, 0.1 sy, 0.0 ni, 87.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13162476+total, 97422176 free, 27608988 used, 6593596 buff/cache
KiB Swap: 10485756 total, 10485756 free, 0 used. 99957064 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
7761 andreeab 20 0 9902.0m 4.662g 356 S 132.1 3.7 5533:59 14 vsearch
7755 andreeab 20 0 18.576g 7.805g 352 S 128.9 6.2 5286:43 2 vsearch
7762 andreeab 20 0 6107388 3.588g 356 S 125.2 2.9 5047:53 12 vsearch
24752 andreeab 20 0 139464 2256 1248 R 1.6 0.0 0:00.50 20 top
4446 andreeab 20 0 739740 166384 15080 S 0.3 0.1 22:37.56 22 ipyrad
4465 andreeab 20 0 100128 37240 724 S 0.3 0.0 6:41.43 24 ipcluster
4467 andreeab 20 0 458528 147332 3940 S 0.3 0.1 12:10.94 26 python3.7
4512 andreeab 20 0 992392 194532 13864 S 0.3 0.1 1:20.33 26 python3.7

[andreeab@gra553 ~]$ top -u $USER
top - 09:34:46 up 8 days, 18:31, 1 user, load average: 3.17, 3.24, 3.26
Tasks: 558 total, 1 running, 557 sleeping, 0 stopped, 0 zombie
%Cpu(s): 46.6 us, 1.4 sy, 0.0 ni, 52.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13162476+total, 94148064 free, 31876356 used, 5600340 buff/cache
KiB Swap: 10485756 total, 10311164 free, 174592 used. 95985096 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
24937 andreeab 20 0 28.107g 0.013t 352 S 195.0 10.4 5406:59 8 vsearch
24939 andreeab 20 0 10.774g 4.575g 352 S 100.0 3.6 5519:20 7 vsearch
24940 andreeab 20 0 7635928 3.966g 352 S 95.0 3.2 5532:40 5 vsearch
10962 andreeab 20 0 139464 2004 1224 R 20.0 0.0 0:00.08 11 top
10903 andreeab 20 0 183424 2832 1008 S 0.0 0.0 0:00.00 10 sshd
10904 andreeab 20 0 126208 2548 1700 S 0.0 0.0 0:00.04 18 bash
17477 andreeab 20 0 462724 13288 7988 S 0.0 0.0 0:03.10 19 orted
17486 andreeab 20 0 988948 190212 13764 S 0.0 0.1 1:13.78 2 python3.7
17487 andreeab 20 0 1185856 389832 13776 S 0.0 0.3 1:14.17 31 python3.7
17488 andreeab 20 0 940776 140960 13088 S 0.0 0.1 0:38.71 9 python3.7
17489 andreeab 20 0 1028408 229460 13708 S 0.0 0.2 1:10.15 19 python3.7
17490 andreeab 20 0 978328 179192 13532 S 0.0 0.1 1:04.10 19 python3.7

Isaac Overcast
@isaacovercast
@quattrinia Ok, great. Glad you got it figured out!
@andreeab917 It looks like you have paired-end data, in which case long runtimes and high memory usage are to be expected.
Qiuyu Jiang
@QiuyuJiang_twitter

@QiuyuJiang_twitter It's totally ok to ask about this here! The .phy output file is probably closest to a concatenated fasta format, but you'll have to do a little work to reformat it. Shouldn't be too hard. Good luck!

Thank you for your help! I am going to look up to it.

wyj-lzu
@wyj-lzu
Hi, when I use ipa.pca, I got this : AttributeError: 'str' object has no attribute 'ecdode'. Could you give me some suggestions about this ? My script is like this :
pca = ipa.pca(data = data,imap = imap,minmap = minmap,mincov = 1,impute_method = "sample",)
Isaac Overcast
@isaacovercast
@wyj-lzu What version of ipyrad are you running? Please update to the most recent version and try again.
Austin Koontz
@akoontz11
Hi everyone! I'm trying to find a meta-analysis/review paper which summarizes (any) genetic distance metrics calculated across different levels of taxa (ideally plant taxa) from RADseq data. If that rings any bells for ipyrad developers/users, would someone be able to point me in the right direction? Thanks, and apologies if this isn't the right type of question for this forum!
vrayno
@vrayno
Hello! I'm trying to demultiplex my raw lane reads by their i7 indexes ( https://ipyrad.readthedocs.io/en/latest/API-assembly/cookbook-demultiplex-i7.html ). I have 4 different indexes, each with 24 shared inner barcodes. For the barcode file, I tried using just names for the four indexes and their i7 barcodes, but this is not pulling up any results when I run step 1. The readthedocs page, above, is unclear on what should be in this barcodes file. Has anyone done this before, or have any idea what I should include in the barcode file for the initial outer barcode demultiplexing step?