What about testing on real data? I haven't rerun the GBS through the whole pipeline ina while, it might be comforting to re-run some real data start to finish, it would be nice to feel confident at least with the datatypes we have access to.
Did you see the advice I gave cyclades about increasing max low quality sites vs decreasing phredQ cutoff? Do you have any thoughts on which is a better strategy? I tested his data with a converted phredQ score of 13 (~95% confidence) and got lots more unfiltered reads. Downstream, do you think its' better to have more N's or more low confidence base calls?
@dereneaton@isaacovercast what do you guys think of these settings that I am using for 150 PE read data; max_low-qual_bases = 20 and phredQ score = 25. I am running this on work computer 24 score CPU and 96gbs ram. What would you guys suggest?
fyi, I should note that cutters with an ambiguous base are not yet supported for demultiplexing. And step1 is really poorly parallelized at the moment, at least since I made some changes last. It's on my TODOs.