These are chat archives for dereneaton/ipyrad

15th
Nov 2016
Amanda Haponski
@ahaponski_twitter
Nov 15 2016 15:48

Hello, I'm just switching over to ipyrad and am having a hard time getting the program to run. I am trying to get the program to with PE ddrad data. With line 7 set to "pairddrad" I keep getting the following error "ipyrad [v.0.4.4]

Interactive assembly and analysis of RAD-seq data

New Assembly: pairpartest
local compute node: [12 cores] on nyx5564.arc-ts.umich.edu

Step 1: Loading sorted fastq data to Samples

Encountered an unexpected error (see ./ipyrad_log.txt)
Error message is below -------------------------------
float division by zero" I would check the log file, but it's blank. I also am running this with line 7 set to just ddrad and that appears to be running fine (or at least it's been running for ~1 hour compared to terminating after a minute). Any advice? Thanks in advance.

Edgardo M. Ortiz
@edgardomortiz
Nov 15 2016 16:00
Hi, the naming of the fastq files is important, could you post an example and also the params file you are using? I also had some trouble when I switched
Amanda Haponski
@ahaponski_twitter
Nov 15 2016 16:28

Thanks @edgardomortiz!!! So an example of my fastq is 26a1_R1.fq.gz and 26a1_R2.fq.gz. And here also is my params file ------- ipyrad params file (v.0.4.4)--------------------------------------------
pairpartest ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps
/scratch/diarmaid_fluxod/haponski/codetest/ipyrad/pairddrad ## [1] [project_dir]: Project dir (made in curdir if not present)

                           ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files
                           ## [3] [barcodes_path]: Location of barcodes file

/scratch/diarmaid_fluxod/haponski/partula_w_unknowns/fastq/*.fq.gz ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files
denovo ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)

                           ## [6] [reference_sequence]: Location of reference sequence file

pairddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc.
CG, AATT ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2)
4 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read
43 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard)
6 ## [11] [mindepth_statistical]: Min depth for statistical base calling
2 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling
10000 ## [13] [maxdepth]: Max cluster depth within samples
0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly
1 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes
1 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter)
35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim
2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences
4, 4 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2)
8, 8 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2)
105 ## [21] [min_samples_locus]: Min # samples per locus for output
20, 20 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)
8, 8 ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2)
0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2)
6, 6 ## [25] [edit_cutsites]: Edit cut-sites (R1, R2) (see docs)
0, 5, 5, 0 ## [26] [trim_overhang]: Trim overhang (see docs) (R1>, <R1, R2>, <R2)

  • [27] [output_formats]: Output formats (see docs)

    [28] [pop_assign_file]: Path to population assignment file

Edgardo M. Ortiz
@edgardomortiz
Nov 15 2016 16:45
@ahaponski_twitter the first thing I see is that your names are lacking an underscore after R1 and R2 respectively, they should be 26a1_R1_.fq.gz and 26a1_R2_.fq.gz, also try updating to version 0.5.1 it fixed some bugs for step 4 and other improvements. I also noticed you have 43 for your quality filter, the default 33 is pretty high already, you may lose many reads. Finally for parameter [16] I would set it to 2, the adapter filter is good and improved my assemblies.
Amanda Haponski
@ahaponski_twitter
Nov 15 2016 16:46
@edgardomortiz Thank you!! I will give those changes a shot and see what happens
Edgardo M. Ortiz
@edgardomortiz
Nov 15 2016 21:07
Hello @dereneaton @isaacovercast , any advice for this dataset? I think my 48hrs will run out before the database gets built, it is a largemem node with 512GB RAM so I can't request more time:
 -------------------------------------------------------------
  ipyrad [v.0.5.1]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: panicum-r1-86
  from saved path: /scratch/01982/jdpalaci/ddrad/f_SE-86_ipyrad/panicum-r1-86.json
  host compute node: [24 cores] on nid02000.ls5.tacc.utexas.edu

  Step 6: Clustering at 0.86 similarity across 666 samples
  [####################] 100%  concat/shuffle input  | 0:08:29
  [####################] 100%  clustering across     | 6:29:59
  [####################] 100%  building clusters     | 0:10:07
  [####################] 100%  aligning clusters     | 12:10:05
  [####################] 100%  database indels       | 2:34:03
  [####################] 100%  indexing clusters     | 1:11:21
  [#                   ]   6%  building database     | 5:22:16