These are chat archives for **nickschurch/Kallisto_Salmon_Sailfish_comparison**

What's being correlated here is TPM, not transformed. I understand the issues surrounding use of Pearsons Correlation and we've had some discussion in our group here about disagreement metrics for clustering, but I an still of the opinion that Pearsons gives the a good and interpretable general overview of data like this. While its certainly true that its somewhat sensitive to outliers, the correlations here are all pretty good still (particularly for salmon) which reflects that for the large majority of the points the agreement is excellent (as you'll see in the next plots). What I find interesting here is the differences in consistency between the answer the tools give. What is it about Salmon that makes it give results for each replicate that are more related than the sailfish results on *exactly* the same data?

This make sense. It would of course be interesting to see how much there is an overall distributional shift versus the correlation differences being driven by outliers.

To summarize some potential thoughts:

(1) The seeding of the offline optimization phase with the online abundance estimates may contribute to the higher correlation between replicates.

(2) Salmon's incorporation of extra auxiliary parameters may have a regularizing effect and may contribute to the higher correlation between replicates.

(3) The difference in correlation between replicates for sailfish and kallisto*may* be due to (1) differences in the initialization conditions of the EM (2) sailfish 0.8.0's omission of some improvements incorporated into 0.9.x

To summarize some potential thoughts:

(1) The seeding of the offline optimization phase with the online abundance estimates may contribute to the higher correlation between replicates.

(2) Salmon's incorporation of extra auxiliary parameters may have a regularizing effect and may contribute to the higher correlation between replicates.

(3) The difference in correlation between replicates for sailfish and kallisto

ok, so the next figure compares the output from two tools (I'll start with Sailfish and Salmon) for each condition for each annotation.

The axes are log10 of the mean TPM of a transcript across the replicates of a given condition in a given annotation for each tool. The red dotted line is the 1:1 relation. The plot is actually a plot of point density (to avoid plotting zillions of points) and the colour represents the log10 of the number of points in the 'hex'. This range is from 1 point in a hex (purple) to thousands (red).

The dark grey lines on the background are the 1standard deviation error bars for all the points. For most points this is small but for some (particularly low TPM) transcripts it is larger and can be asymetric. It gives us some kind of idea about the level variation in the measurements The Pearsons correlation (and 95% confidence intervals) are shown in the upper left corner.

Unrelated to the science, but, these plots are *beautiful*. Do you mind if I ask how you made them?

What struck me here is that 1) the overall agreement for the tools looks very good. 2) the correlation between the tools gets slightly worse and a 'tail' of transcripts with quite substantial TPM disagreements develops as the number of transcripts in the annotation increases (tair<araport<atrtd) and 3) there is an asymetry to the spread of the data around the 1:1 line.

Thanks Rob! They are made with matplotlib & seaborn in an ipython notebook. Soon I'm hoping for d3 interactive versions ;)

Ok, so the asymmetric behavior seems to be that, on relatively low abundance transcripts, when the methods disagree, salmon favors a lower TPM than sailfish.

yes. in fact, I can (almost) convince myself by squinting that there is a set of high TPM points where that is true too. Of course, we must remember that the vast majority of points lie in the very tight red/yellow band that are essentially bang on the 1:1 line (reflected by the high R values).

Do it looks to me like low TPM transcripts are where tools disagree most and that with increasing size of annotation more transcripts fall into this category and the tools thus disagree more.

(1) The lower abundance transcripts are more difficult to quantify with high accuracy, as there is less evidence on which to estimate the abundance (assuming there is still a reasonable amount of multi-mapping

(2) The more transcripts exist in the reference (especially if they are similar in terms of splicing patterns etc.) the more difficult the inference problem becomes (i.e. the inference problem becomes higher dimensional, and the likelihood function can look different)