Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Joseph Bullock

    Hi @ggqshr No problem. If you want to be able to reproduce the same result each time then you can set the random_state to an interger value. See the parameters on the gensim page: https://radimrehurek.com/gensim/models/ldamodel.html

    Hope this helps :)

    @JosephPB unfortunately, I have set random_state, but the results are still different each time.My situation is the same as the following page, but the passes parameter does not work.
    Philippe Rivière
    hello, I'm using gensim to generate an LDA model of my documents. Then I export the vectors to matrixmarket format, and create a 2D embedding with UMAP in JavaScript. So far so good. Now I would like to do this UMAP transform in python, but I can't find out how to "convert" the documents vectors in the LDA topic space… It should be "obvious" in the sense that what I need is a n * m matrix when n is the numbers of documents and m the number of topics.
    Philippe Rivière

    I'm blocked here:

    transformed = lda[corpus_lda]
    X = np.array(transformed)
    embedding = umap.UMAP().fit_transform(X)

    the value of X is an array of lists instead of a numpy array expected by umap.

    Philippe Rivière
    I built the np.array by hand and it works
    Herli Menezes
    Hi, is there any gensim module for portuguese language?
    Herli Menezes
    More specifically. How to manage diacritics in gensim?
    Hi All, is this channel active?
    @piskvorky Quick question. I know that LSI can return less than requested number of topics (for short texts, usually). I think LDA does that, too. How about HDP? Could it ever return less than the requested number of topics (in my interpretation, that is the m_T property)?
    Andrew M Olney
    Greetings, I'm teaching a class using gensim at this very moment. All my windows users have hit “OverflowError: Python int too large to convert to C long” when executing this line of code: fakeDataset = downloader.load('fake-news') I could try to distribute the dataset manually, but are there any other suggestions?
    Andrew M Olney
    I'll put an issue on GitHub. Thanks :)
    I am having issue with LDA model, after training when i try to see the topics distribution of some terms it gives an empty list [], could anyone tell why it is happening.. Thanks in advance.. :)
    Rob Creel

    Good day. I'm going through the tutorials and I'm getting an error. On the run_corpora_and_vector_spaces.ipynb notebook, in the cell with the following code

    for vector in corpus_memory_friendly:  # load one vector into memory at a time

    I get this error

    HTTPError: 404 Client Error: Not Found for url: https://radimrehurek.com/gensim/mycorpus.txt

    The code does not look like it should be calling/visiting a URL, but it seems to be trying and failing to. What's going on here? How may I run the tutorial?

    Machine specs:
    Operating System: Manjaro Linux
    Processors: 4 × Intel® Core™ i5-3320M CPU @ 2.60GHz
    Memory: 15.5 GiB of RAM
    Notebook is running in Jupyter Lab in Firefox 77.0.1 (64-bit)

    Hi. In word2vec it can be useful to distinguish the context to the left of a word and the context to the right.
    does gensim support this?
    Qi Wang
    Has sent2vec been merged yet?
    Data Knight 🎠
    How can i start contributing to genism
    hello everyone. I'm a novice in gensim, so my question will be very simple. How can I train some easy text data, for detection similar text, like this:
    I want, to get most similar value for word "animal" if I type "cat, dog, rabbit"
    I'm trying to do this one:
    cat animal
    dog animal
    rabbit animal
    but result, is not that I need. "Animal" doesn't has most similar value
    11 replies
    Stephan ☕️

    I'm trying to get some code for older gensim working on V4.

    This is part of the code:

        # Get the doc2vec labels from indices
        for elem in bestdoc2vec:
            ind = d_indices[elem]
            temp = model1.dv.index_to_doctag(ind)
            resultdoc2vec.append((temp, float(avgdoc2vec[elem])))

    This results in the error: AttributeError: 'KeyedVectors' object has no attribute 'index_to_doctag'.

    Any ideas how to rewrite this code for V4?

    1 reply