These are chat archives for FreeCodeCamp/DataScience

24th
Sep 2016
Akash Goel
@goelakash
Sep 24 2016 11:22
Hi
anybody here has any idea of Latent Semantic Analysis?
evaristoc
@evaristoc
Sep 24 2016 12:00
@goelakash I have. For what?
Akash Goel
@goelakash
Sep 24 2016 12:04
Okay, so lets say you have reduced a document matrix to lower dimensions
And you want to send a query with a document vector to find similar doc from your matrix
Now I understand how to reduce the query document to lower dimensions to make it compatible with your low-dimension document matrix
But since the low dimension matrix doesn't map 1-to-1 to actual documents, then how do you decide which documents to return for the query
In other words, Application #2 from the wiki page - https://en.wikipedia.org/wiki/Latent_semantic_analysis#Applications
One way I imagine this happening is if you only reduce the dimensions in the number of terms (i.e., instead of mn, you now have a pn doc matrix, where p<m, with m as the number of terms in the document set, and p is the number of reduced concepts)
Akash Goel
@goelakash
Sep 24 2016 12:16
Is this the correct way to solve the problem?
evaristoc
@evaristoc
Sep 24 2016 14:46

@goelakash It will be always an approximation, that is the idea. Depending on the number of components you are using for the reduction as well.

Normally you pick the first listed value in the comparison (you can get a ranking based on similarity) but you can always choose from a group of the best ranked ones.

At the moment I can't answer I will later share with you a project that might explain better that technique.