These are chat archives for dereneaton/ipyrad

21st
Jan 2019
Isaac Overcast
@isaacovercast
Jan 21 03:03
@jhoepke In general rolling back to earlier versions of ipyrad will probably introduce more problems that it fixes, so I would advise against this. The original error we're seeing here (ValueError) is a problem with misspecified parameters during step 7. In your case it looks like the max_Indels_locs param is filtering out all loci. You can try increasing the values for this parameter and it should work better.
Isaac Overcast
@isaacovercast
Jan 21 03:19
@caesantibanez The ValueError(zero-size array to reduction operation minimum which has no identity) is caused by a filtering parameter removing all loci from the dataset. If you have PE 3rad data and very long reads it may be because of the indels. Check your params file and maybe tune some of the filtering parameters to allow more indels or more het sites.
@LinaValencia85 Did you end up getting s7 to work okay? sorry for the slow response, I have been at a conference so I'm catching up on stuff.
Isaac Overcast
@isaacovercast
Jan 21 03:26
@tommydevitt For the sra problem you might have to verify that the SRA tools installed correctly (sorry I know this is annoying). Also you might try install -c bioconda perl-net-ssleay to fix a known bug.
@tommydevitt For the EngineError this is almost certainly a memory allocation problem. how much ram do you have on the system you're running on?
@tle003 toytree is an ipyrad affiliated package (since Deren wrote it), so it might be better to ask for help on this here https://gitter.im/eaton-lab/Lobby. Sorry I can't be of more assistance but I really don't know the toytree code that well.
Isaac Overcast
@isaacovercast
Jan 21 03:47
@tim-oconnor See my earlier comments about the step 7 ValueError issue, it's a matter of parameter settings overfiltering and removing all loci from your data.
Isaac Overcast
@isaacovercast
Jan 21 03:53
@tim-oconnor reads_passed_filter includes includes all the reads, whereas the refseq mapped and unmapped reads have undergone a dereplication process to remove identical reads. This explains the count difference.
Isaac Overcast
@isaacovercast
Jan 21 04:00
@tim-oconnor For the reference vs denovo+reference issue this does seem to be a problem, i'll have to look into it more. Let me get back to you tomorrow.
tim-oconnor
@tim-oconnor
Jan 21 04:51
@isaacovercast Thank you! I know you were away running RADcamp, so I'm sorry to pile up the questions when you're busy. I appreciate the help.
jhoepke
@jhoepke
Jan 21 07:27
@isaacovercast Regarding the error in step 7 ("ERROR IPyradWarningExit: error in filter_stacks on chunk 0: ValueError(zero-size array to reduction operation minimum which has no identity)") you suggested "In your case it looks like the max_Indels_locs param is filtering out all loci." THIS IS DEFINETLY NOT the case since this value had the same settings (8, 8) for both versions 0.7.28 and 0.7.1 but only the older version was able to complete step 7 on my example dataset. There must be another problem! I will send you my minimal example, maybe this will help you.
tim-oconnor
@tim-oconnor
Jan 21 17:35

@jhoepke @isaacovercast I've tried dramatically loosing the filtering criteria for step 7, but it does seem something else is going on. I get the same error as before with the following (ludicrous) parameters.

2                              ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
100, 100                       ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
100, 100                       ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
1                              ## [21] [min_samples_locus]: Min # samples per locus for output
100, 100                       ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
100, 100                       ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
.99                            ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
0, 0, 0, 0                     ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs)
0, 0, 0, 0                     ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)
G, a, g, k, m, l, n, p, s, u, t, v ## [27] [output_formats]: Output formats (see docs)
                               ## [28] [pop_assign_file]: Path to population assignment file

There are definitely loci to choose from after step 6, since I can see plenty of clusters in lig-lig_catclust.gz.

In case the issue was something to do with 3RAD assemblies, I also returned to step 3 and ran a branch where I set datatype to pairddrad, but I got the same error. I'm happy to send files if that's helpful.

Isaac Overcast
@isaacovercast
Jan 21 17:47
@tim-oconnor Yeah, can you dropbox me the raw files? And a params file. Something weird is happening and I can't reproduce the error, it'll help to be able to reproduce it.
Isaac Overcast
@isaacovercast
Jan 21 18:39
@tim-oconnor @jhoepke Ok, well I see the problem. If you have paired end data and all of the loci are merged (no non-overlapping R1/R2) then it throws this error. I can fix the code. Simplest way to work around this is to run only step 7 using the datatype 'gbs' rather than 'pairgbs'. I tested on @jhoepke's data and it worked fine. Sorry that's a really nasty bug!
Isaac Overcast
@isaacovercast
Jan 21 18:59
@tim-oconnor @jhoepke Pushed a new conda package (for linux) v.0.7.29.
conda install -c ipyrad ipyrad
tim-oconnor
@tim-oconnor
Jan 21 19:35

@isaacovercast Thanks! It seems I have a different issue, but perhaps related to @jhoepke. My data contain a mix of merged and (mostly) unmerged reads. I updated to v.0.7.29, but step 7 is still throwing the same error . Changing the datatype to "ddrad" did help me get a bit farther down the road with step 7, but I then get this error:

  Step 7: Filter and write output files for 192 Samples
  [####################] 100%  filtering loci        | 0:02:21  
  [####################] 100%  building loci/stats   | 0:00:12  
  [####################] 100%  building alleles      | 0:00:19  
  [                    ]   0%  building vcf file     | 0:00:08  
  Encountered an unexpected error (see ./ipyrad_log.txt)
  Error message is below -------------------------------
TypeError(argument of type 'int' is not iterable)

Here's the log for that run. It looks like there's some other errors in there too:

 -------------------------------------------------------------
  ipyrad [v.0.7.29]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  Begin run: 2019-01-21 11:25
  Using args {'force': True, 'threads': 2, 'results': False, 'quiet': False, 'merge': None, 'ipcluster': None, 'cores': 22, 'params': 'params-lig-lig.txt', 'branch': None, 'steps': '7', 'debug': False, 'new': None, 'download': None, 'MPI': False}
  Platform info: ('Linux', 'n0150.savio2', '3.10.0-693.11.6.el7.x86_64', '#1 SMP Wed Jan 3 18:09:42 CST 2018', 'x86_64')2019-01-21 11:29:40,833     pid=14349     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:41,120     pid=14344     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:41,287     pid=14315     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:41,447     pid=14353     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:41,619     pid=14360     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:41,621     pid=14350     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:42,076     pid=14359     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:42,186     pid=14357     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:42,303     pid=14343     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:42,621     pid=14019     [assembly.py]    ERROR     TypeError(argument of type 'int' is not iterable)
2019-01-21 11:29:43,041     pid=14342     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:43,054     pid=14347     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:43,110     pid=14339     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:43,237     pid=14341     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:43,331     pid=14351     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:43,358     pid=14348     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0
2019-01-21 11:29:43,662     pid=14358     [write_outfiles.py]    ERROR     Invalid chromosome dictionary indexwat: 0

I've just finished compiling my files so I'll share those with you in a moment. They may also be helpful for diagnosing the reference / denovo+reference weirdness.

Isaac Overcast
@isaacovercast
Jan 21 20:02
Hm, yeah that looks different. let me know when the files are ready and i'll take a look.
tim-oconnor
@tim-oconnor
Jan 21 20:04
@isaacovercast You should have gotten an invitation a little bit ago via Gmail. Let me know if it hasn't arrived or if you need me to send to a different email address (I used your academic one)
Whoops, wrong address -- incoming now.
Isaac Overcast
@isaacovercast
Jan 21 21:17
Got it. It looks like theres only R2 in the google drive. Can you upload R1 as well?
tim-oconnor
@tim-oconnor
Jan 21 21:19
@isaacovercast From my end it looks like there are both R1 and R2 files for the two different plates (lig1, lig2) -- could you double check if it's showing up for you, too? Otherwise I can try again.
Isaac Overcast
@isaacovercast
Jan 21 21:24
could just be a synching issue, i'll give it a minute and check again.