Dec 2016
Jenny Archibald
Dec 13 2016 18:31

Hello @dereneaton @isaacovercast, I have had multiple runs of ipyrad end with errors in step 6. I was hoping that some changes to my settings and use of new versions of ipyrad would fix it, but the last run failed as well (using v.0.5.9). Perhaps you have some advice?

The error:
2016-12-12 20:38:21,255 pid=36133 [] ERROR error in singlecat (Qw_HPR10M) IOError(Driver write request failed (File write failed: time = mon dec 12 20:38:20 2016
, filename = '/panfs/', file descriptor = 61, errno = 5, error message = 'input/output error', buf = 0x7fff1b00b910, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0))
2016-12-12 20:38:22,104 pid=36133 [] ERROR tuple index out of range

Some of my settings:

PBS -l nodes=8:ppn=8,mem=504gb,walltime=168:00:00

ipyrad -p params-m04c90.txt -s 67 -c 64
running as pairedgbs, 288 accessions

I saw some advice above about using fewer cores. Am I using way too many? I am unfortunately almost out of space on our cluster and so I can't just try running several different iterations to see what works. Thanks for all your work on ipyrad!

Jenny Archibald
Dec 13 2016 19:54
As a follow up: I have 1T available to hold my data (which take up 253G) and run the analyses. After the last run (partway through step 6), it is filled to over 96% capacity. Should it be using that much space? Could the problem be connected to running into space limits?
James Clugston
Dec 13 2016 20:53

@dereneaton @isaacovercast Hi guys I am wondering if you can help me. I am trying to get some data into the R package Adegenet and I was wondering if can offer me some pointers to some of the questions it asks. I know @isaacovercast mentioned before I am better using the mydata.u.stu rather then mydata.stu. When I bring the file into the package is asks me a number of questions.
How many genotypes are there? (am assuming this is just the number of samples e.g. 74)
Which column contains labels for genotypes ('0' if absent)?
Which column contains the population factor ('0' if absent)?
Which other optional columns should be read (press 'return' when done)?
Which row contains the marker names ('0' if absent)?
Are genotypes coded by a single row (y/n)?

Can you guys offer any advice or anyone? also when are you going to be supporting fineradstrcuture package?

Deren Eaton
Dec 13 2016 21:27
Hi @jenarch, this problem looks like it could be a write-error caused by running out of hard-disk space. We've been working lately on refining memory management of paired-end data sets. But I did not expect that any intermediate files would be hundreds of Gigs in size. I'll definitely check on some test data sets and try to reduce the file size. The next update, coming soon, already has a number of changes that should help.
Deren Eaton
Dec 13 2016 21:39
@Cycadales_twitter, I haven't used adegenet, but the file should work since it works in STRUCTURE. The format is a bit weird, because it has a number of empty columns it that can be confusing. Here is my best guess: The number of genotypes is probably the number of rows in the file, which is 2 x the number of samples. Which column contains labels for genotypes? (1); which column contains population factor? (0); which other columns should be read? (there are five empty columns between the names and the data); which row contains the marker names (0); are genotypes coded by a single row (n); The fineradstructure output is taking a while, since it requires some pretty complicated work on dealing with phasing of alleles. It's in the works, though.
Jenny Archibald
Dec 13 2016 22:04
@dereneaton, thank you for the quick response! I will try again when the next update comes out and would definitely appreciate any changes that result in smaller files.