@freeman-lab Hi Jeremy, I'm trying to use the PCA code in the thunder project but I suspect the code isn't returning correct results or I might be doing something wrong. E.g. if you take the simple array: X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) the PCA code in thunder returns the following principal components: array([[-0.83849224, -0.54491354],

[ 0.54491354, -0.83849224]]) while the result from scikit-learn is array([[ 0.83849224, 0.54491354],

[ 0.54491354, -0.83849224]]). Also when I call transform on the same array and compute the variance along each PC I get ([ 0.16666667, 0.16666667]) which is wrong because the variance along the first PC should be larger than the second. I'm getting similar results for my actual example i.e. the variance alone each PC is the same. Do you know what might be going on here? Thanks a lot for putting together this project. It's awesome.

[ 0.54491354, -0.83849224]]) while the result from scikit-learn is array([[ 0.83849224, 0.54491354],

[ 0.54491354, -0.83849224]]). Also when I call transform on the same array and compute the variance along each PC I get ([ 0.16666667, 0.16666667]) which is wrong because the variance along the first PC should be larger than the second. I'm getting similar results for my actual example i.e. the variance alone each PC is the same. Do you know what might be going on here? Thanks a lot for putting together this project. It's awesome.

@asishgeek thanks for the feedback! regarding the principal components, there is an equivalence up to sign flip for PCA, and if you notice it's equivalent to the answer from sklearn up to a sign flip on the first component

some libraries enforce conventions to fix the sign one way or the other (e.g. make it so the largest element in each column is positive), i'm not sure if sklearn does or not, matlab does

regarding the variance explained on the transformed data, can you say more about how you're computing that? i'm confident the coefficients are identical to what sklearn does (up to the sign flip), but there may be differences in how explained variance is computed, and also how mean subtraction is handled during the transformation

i've been meaning to look into this more

@freeman-lab Thanks for the reply. I get your point about the sign flip. To compute the variance I do the following for sklearn: np.var(pca.transform(X), axis=0) which gives me the result array([ 6.61628593, 0.05038073]). While, in thunder I do this: X_t = pca.transform(X), X_t.variance() which returns ([ 0.16666667, 0.16666667]). For the latter case X is a RowMatrix.

@freeman-lab I'm not sure where you want that line break to go

@rhofour sorry, just added a clarification =)

Ah, didn't do that with my other example dataset. I'll update both of them.

oh great, must've missed the first one

^- done

@asishgeek :point_up: May 3 2015 10:29 AM great, that helps explain it! in PCA.transform there's a step normalizing each transformed variable by the corresponding latent value (here). if i remove that, with your example, I get exactly the same output as the sklearn code. that extra division was inherited from a similar calculation in the SVD, but it may not be appropriate here given how people usually use transforms in PCA, so i think we'll remove it!

@rhofour thanks, merged!

:) Always happy to see stuff get merged upstream

Currently working on actually benchmarking our inverse

If we can make it competitive with other frameworks I'll send a huge PR

@freeman-lab :point_up: May 3 2015 11:43 AM Got it. ! Thanks a lot for the clarification.