- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

@eIGato surely i understand that norm means a vector's length. But i was confused about whether I average the vectors first, then normalized the avg, and calculate the inner product, OR i normalize each vector and calculate an inner product , and then take the average (or equivalently average and then inner product) ?

the result will be quite different between the two ways

the difference is to normalize the average vector, OR to normalize each vector and then take the average

Exactly. if some word's length is high, it will have more impact on the average. If the word2vec's length does have some significant meaning, it's more justified to average the raw vector. But some of my document's average words vector does give some unexpected high similarities to some obviously un-relevent query word. That

That is why i am wondering whether averaging raw word2vec vectors make more sense than average each word's similarity

@rambo-yuanbo What are you trying to do anyway? Forget the mechanics of it, what is the general goal?

Hi everyone my name is Pramodith and I'm a graduate student at the Georgia Institute of Technology, I'm interested in contributing to the gensim library as a part of GSOC 2018. I would really like to work on Neural Networks and evaluate and implement a published paper. Am I too late to the party? And can anyone give me more guidance as to how to move forward?

any solution

Previously I was asking about fixing certain vectors in place, and found out that there was a mechanism to do that. I am thinking it should be possible to have a model

`converge`

to more or less the same state, if I were to keep certain words/documents' vectors fixed while mutating all others.
Hi guys. I've thought up a hack for Doc2Vec inferrence. But i don't know if it does make any sense.

The problem was that

The hack is that after the bulk training i just re-infer all vectors and replace all document vectors with inferred ones.

Does that make sense?

The problem was that

`infer_vector()`

produces a vector that is very different from bulk-trained vector of the same document.The hack is that after the bulk training i just re-infer all vectors and replace all document vectors with inferred ones.

Does that make sense?

Do it a few hundred times, maybe it will converge to something stable. :-)

I've calculated the similarity of first and second inferrence: it's about

`1.0`

for all the vectors.
with steps = 1000