Hi, @rambo-yuanbo! Maybe this will clear things up a bit:
I believe that multiple tags allows to build a document vector which contains not only one document's ID, but multiple tags IDs instead. Example in the SO question is quite descriptive.
similarity_matrix = w2v_model.similarity_matrix(dictionary)
. The error I got was AttributeError: 'KeyedVectors' object has no attribute 'similarity_matrix'. I couldn't find references to similarity_matrix in the docs, but could be wrong. Anyone better versed can help me?
model[['my', 'document']]
FastText
- infer vector for each word and calculate vector of document as average of word-vectors
from gensim.corpora import Dictionary
from gensim.models import LsiModel
data = [["a", "a", "b"], ["c", "d"]]
dictionary = Dictionary(data)
corpus = [dictionary.doc2bow(doc) for doc in data]
model = LsiModel(corpus, id2word=dictionary)
list(model[corpus]) # [[(0, 2.236067977499789)], [(1, -1.4142135623730951)]]