These are chat archives for dereneaton/ipyrad

12th
Nov 2018
Nathan Byer
@Testudinidude_twitter
Nov 12 2018 17:13
Hi there! I am working through merging three sequencing runs - the first two are Illumina HiSeq 2500 High Throughput 1x100 runs, and the last is a HiSeq 2500 Rapid 2x100 run. Samples are identical between runs 1 and 2, but about 10% of samples in the 2x100 run were replaced by new samples (as the samples from runs 1 and 2 were no longer available); for run 3, those 10% of new samples were given unique names to prevent issues downstream. Since runs 1 and 2 are identical, merging them went fine; from here on out I will refer to this merged library as run 1+2. However, attempting to merge run 3 with run 1+2 leads to some issues. In particular, since R2s are available for run 3 but not run 1+2, ipyrad crashes with the following error during step 2 : IPyradError(Error concatenating fastq files. Make sure all these files exist: ['cat', '', '/media/ByerBackup/ipyradpaired_formerging/turtles_rapid_paired_formerging_fastqs/H09R2.fastq.gz'] . It sounds like, since R2s are not available in run 1+2, these libraries cannot be merged while retaining the paired end reads (when available). Is this interpretation correct, or do you have any suggested work-arounds for situations like this (merging paired-end and single-end runs from more-or-less the same samples)?
Isaac Overcast
@isaacovercast
Nov 12 2018 17:59
@Testudinidude_twitter Yeah merging paired end and single end is not directly supported. The most straightforward way is just to run the PE as SE and ignore R2. This is guaranteed to work, but results in throwing out data. Another way could be to try merging after step 3, since this is when PE reads are 'merged' and from there on are treated somewhat more like SE data. If all your reads PE reads are overlapping this should work nicely. If many of your reads are non-overlapping it may still work, but you may have to tweak parameters to get it to run, no promises. Dropping R2 is my recommendation, just run it this way and see how it goes. If you need the info from R2 then you can try hacking on it after that.
Nathan Byer
@Testudinidude_twitter
Nov 12 2018 18:02
@isaacovercast great, thanks - that was my thinking as well! I will just try the R1s for now.
Sarah J Jacobs
@sarahjjacobs
Nov 12 2018 19:25
Hi folks - I'm trying to run a bucky analysis and am running into problems parsing my .loci file into separate nexus files for the mrbayes runs (see error(s) posted below). Data are paired-end and unphased, therefore I have a .loci file (as opposed to an _alleles.loci file, which as I understand ipyrad does not generate at the present moment). The initial error suggested to me that the main problem is the 'n' that is used to link R1 with R2 during processing Step3 - it is unrecognized by the script as a valid character in the sequence. If I go into the bucky.py script and add an 'n' to the _AMBIGS dictionary (with the options being to assign an 'N' in its place), and then rerun the script, I get a different error, now telling me there's a formatting issue with my alleles file and that I should rerun step7 using a more recent version of ipyrad. The version I'm using (v.0.7.24) post-dates the bug fix. So, I guess my question is, in the current version of ipyrad is it possible to parse the .loci file into separate nexus files, or am I fighting a losing battle? Has anyone out there done this recently?

Here's the initial error:


KeyError Traceback (most recent call last)

<ipython-input-5-702fd1212821> in <module>()
----> 1 c.write_nexus_files(force=True)

/Users/sjjacobs/miniconda2/lib/python2.7/site-packages/ipyrad/analysis/bucky.py in write_nexus_files(self, force, quiet)
231 ## sample one of the alleles if .alleles file.
232 if not self._alleles:
--> 233 seqsamp = _resolveambig(seqsamp)
234
235 ## find parsimony informative sites

/Users/sjjacobs/miniconda2/lib/python2.7/site-packages/ipyrad/analysis/bucky.py in _resolveambig(subseq)
605 for col in subseq:
606 rand = np.random.binomial(1, 0.5)
--> 607 N.append([_AMBIGS[i][rand] for i in col])
608 return np.array(N)
609

KeyError: 'n'

Here's the error after I include the 'n' in the _AMBIGS dictionary:

Warning: encountered an error in the alleles file format. This
is a bug that was fixed in v.0.7.2. Rerun step 7 on this data
set to ensure that the alleles file is properly formatted.

wrote 0 nexus files to ~/Desktop/Escallonia_sptree/bucky/bucky