gensimto vectorize and cluster a relatively small set of sentances; the frequency counts inside our data set are misleading for the purposes of TFIDF so we need to use external frequencies. I imagine we are not the only ones. I was just wrapping
LsiModelinto our model and for the first two I could use
.build_vocab_from_freq()while for LSI I had to do this work around.
Sent2VecPR though because that might have potential too
doctag_lockshave float elements? Why? The comments seemed clear enough that these control whether weights can be further changed during
trainexecution. Boolean seems the most appropriate type. Would there be any reason for this? Are the values simply used elsewhere to multiply weights? What would be the
docvecssection have the same (+-1%) value. What's the matter and how to heal that?