Hi, @rambo-yuanbo! Maybe this will clear things up a bit:
I believe that multiple tags allows to build a document vector which contains not only one document's ID, but multiple tags IDs instead. Example in the SO question is quite descriptive.
similarity_matrix = w2v_model.similarity_matrix(dictionary). The error I got was AttributeError: 'KeyedVectors' object has no attribute 'similarity_matrix'. I couldn't find references to similarity_matrix in the docs, but could be wrong. Anyone better versed can help me?
FastText- infer vector for each word and calculate vector of document as average of word-vectors
from gensim.corpora import Dictionary from gensim.models import LsiModel data = [["a", "a", "b"], ["c", "d"]] dictionary = Dictionary(data) corpus = [dictionary.doc2bow(doc) for doc in data] model = LsiModel(corpus, id2word=dictionary) list(model[corpus]) # [[(0, 2.236067977499789)], [(1, -1.4142135623730951)]]