Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
  • 21:12
    no-response[bot] unlabeled #4771
  • 21:12
    jdukatz commented #4771
  • 20:41
    vitaly-d commented #4703
  • 20:38
    vitaly-d closed #4703
  • 20:37
    vitaly-d commented #4703
  • 19:34
    adrianeboyd commented #4773
  • 19:09
    kbmlcoding commented #4773
  • 18:59
    kadu9 commented #4773
  • 18:23

    ines on master

    Fix int value handling in Match… (compare)

  • 18:22
    ines closed #4749
  • 18:22
    ines closed #4717
  • 18:22

    ines on spacy.io

    Iterate over lr_edges until sen… Fix typo [ci skip] Auto-exclude disabled when call… and 10 more (compare)

  • 18:22

    ines on master

    Update token.md (#4767) * Upda… (compare)

  • 18:22
    ines closed #4767
  • 18:21
    ines commented #4767
  • 18:20
    ines synchronize #4767
  • 18:18

    ines on master

    fix bug in EL predict (#4779) (compare)

  • 18:18
    ines closed #4779
  • 18:18
    ines closed #4772
  • 18:17

    ines on master

    Document jsonl option on conver… (compare)

Chris Swart
Hi everyone,
I was wondering if people had experience using nlp.pipe to speed up pipeline processing. I have tried different batch size from 5,50,500,5000 to process 30000 texts with 1500 chars and have seen no speedups. I am on spacy 2.0.12 so is this just an issue with my library version or does batch processing not work?
Tiago Sousa
Hi all, I have a question: How would you create answers for yes-no questions using spacy? Suppose we have a question like "Are you good today?", I would like to have something that automatically answers "Yes, I am good today.". I have tried multiple things but I didn't have any luck :/
I tried updating existing spacy ner model with my data, by now it is not able to detect even the GPE and other generic ones which it was able to do earlier, i know as mentioned it is forgetting it seems, what is the solution for it, I used 200 sentences with new entity types, those 200 sentences has only my new entity labelled data , should I missed on something, any suggessions
Sergiusz Bleja
Is there an easy way to set up an EC2 server that works with spacy deep learning? I wanted to run through the spacy-transformers intro notebook, and thought using an Ubuntu deep learning AMI would work out of the box (+ installation of python packages using conda/pip of course) but ran into issue with using the GPU on a p2 machine ("GPU is not accessible. Was the library installed correctly?") . The cuda toolkit wasn't installed on that machine, but installing that via apt-get didn't do the trick either. Are there better AMI's to start with?
Rahul Shinde
its about the size of training data that matters the most
Matt Maybeno
Hey all, anyone try calculating saliency from processed documents through spacy?
to be more specific, calculating the saliency of a given entity from the entire document that was processed. I thought about using the similarity of the document with relation to the entity itself as a quick and dirty first attempt.
Danyal Andriano
I am trying to set up new entity labels based on menu items, but I am running into issues with overlapping tokens. For instance, there are chicken wings and chicken strips - chicken is used optionally with wings. However, when I add pattern1 = [{'LOWER': 'buffalo', 'OP': '?'}, {'LOWER': 'chicken', 'OP': '?'}, {'LOWER': 'wings'}] pattern2 = [{'LOWER': 'chicken'}, {'LOWER': 'strips'}] - I get an error for overlapping tokens. How can I have the same token chicken be a part of multiple entities?
Danyal Andriano
Also, I would like to add words to my lemmatizer, but I am not sure where/how to save the added words. I've tried adding a lookups table and saving it, but it doesn't persist.
Hi, I have updated a spacy model with my new entity, now I am looking into its deployement part, any leads or help on how to deploy it, as I see when i save the new updated trained model, it is saved a folder structure inside main folder, now to use it I can load the main folder fully and use it, but now for productnising it, what should be the points I must consider, any guide or help will be nice, thanks
@shinde-rahul @shinde-rahul thanks
Koustuv Sinha
Hi, anyone here got the example from this blog post (https://explosion.ai/blog/spacy-transformers) working? It always throws the following error:
~/miniconda3/envs/dialog/lib/python3.7/site-packages/spacy_transformers/pipeline/tok2vec.py in set_annotations(self, docs, activations)
    196             # to do in one shot without blowing up the memory?
    197             for i, word_piece_slice in enumerate(wp_rows):
--> 198                 doc.tensor[i] = wp_weighted[word_piece_slice].sum(0)
    199             doc.user_hooks["vector"] = get_doc_vector_via_tensor
    200             doc.user_span_hooks["vector"] = get_span_vector_via_tensor

cupy/core/core.pyx in cupy.core.core.ndarray.__getitem__()

cupy/core/_routines_indexing.pyx in cupy.core._routines_indexing._ndarray_getitem()

cupy/core/_routines_indexing.pyx in cupy.core._routines_indexing._prepare_slice_list()

IndexError: too many indices for array
Alessandro Piscopo
Hello everyone. I'm trying to train a Spacy Entity Linking model using Wikidata and Wikipedia, using the scripts in https://github.com/explosion/spaCy/tree/master/bin/wiki_entity_linking. I've generated the KB and moved to training the model, but that is not done yet after more than a week. How long should that take normally? (I'm not using a GPU)
Is there a pretrained Wikidata entity linking model I can use?
What type of neural network does spaCy use when u build a NER from scratch. I don't think it's clear in doc
Herli Menezes
got a problem here when trying to use spacy portuguese language. I have used the spacy given example:"Esta é uma frase." I got for "é" 3rd person of "estar" verb, (u'\xe9', u'VERB'). How can I circunvent this?
Should I import codecs or something like that?
Rahul Shinde
@davidbren Please see this video https://youtu.be/sqDHBH9IjRU, this has hints but me too have the same question
Hi All, I'm looking for a solution for Korean language (pipeline for RASA, a conversation platform). It seems not an easy task :( Any information or help? Thanks!
Can I use spacy for Text to SQL conversion?
John Anderson

I have a token that is a left bracket, that was parsed from the sentence: [carlota] Chicas, ponedla aquí.

(Pdb) pp token

If I check if it is a punctuation it says yes:

(Pdb) token.is_punct

But then I get the part of speech and it says PROPN not PUNCT:

(Pdb) token.pos_
Sam Petulla
@alepiscopo Did the model finish? What is your machine setup?
Carsten Schnober
is it possible to add vectors to an existing model?
I would like to use FastText vectors in nl_core_news_sm
so I can create a new model with python3 -m spacy init-model nl ..., but then I won't have the other pipeline components like sentencizer, NER etc. in that new model
jai priyadarshi
I re-trained my custom SpaCy model? Whats the method or how should I evaluate its accuracy?
Gustavo Gonçalves
@alepiscopo When you finished building the KB you didn't get an "The nlp object should have a pretrained ner component." error from the linker training script? If not, what were your parameters to build the KB? Thanks!
Sam Horton
I'm in a position where my company is maintaining a fork of spaCy. I'm trying to determine how the build artifacts that are posted to PyPI are generated so that we can build them ourselves. The README explains how to do a local custom build. However, I am in need of posting to a private pip registry. The best I can determine is that it has something to do with the fabfile.py file and the builds are generated and posted through the buildkite service. Can someone help me?
Sam Horton
Upon closer look, it appears spaCy builds come from this project https://github.com/explosion/wheelwright
Alessandro Piscopo
Hi @spetulla_twitter the training never finished and always ends with an error. I'm using a 4 cores VM on GCP, with 256 GB. I get the error while loading the gold_entities.json file.
@gsgoncalves I never got the error you mentioned. I used the default parameters.
Hi I am new to sapcy, I want to develop a model which gives me the text similarity based on the intent.For example "I like cats" and "I hate cats" should be very dissimilar but when I am using "similarity" it gives me very high similarity.
if i train a model with spacy's cli.train method, a bunch of models is created. Can anyone plz tell me, what the difference between best and final model is?
Couldn't find any documentation about it. ty
Matt Maybeno
looking to create a PR but it requires cupy, anyone have suggestions on ways to mock it?
How can I split a sentence based on conjunction like 'but' using Spacy?
Jack Park
@Bipinoli I did not split on conjunctions inside spacy but did so in an iterator outside after creating a masterTokens list for each sentence. In my case, it was important to locate the predicate (single-predicate sentence) in order to spot triple structures around that predicate.
Sam Petulla
@alepiscopo I wasn't able to train, either. Has anyone been able to train with the linking script? Curious how much RAM is needed.
Jonathan Bastnagel
Hmm, I can't seem to figure out how to deal with compound words that aren't in
the model. For example bucketlist vs bucket list.
In theory the similarly for these two should be basically identical.
Is this something the tokenizer should be handling?
Jonathan Bastnagel
@asif-khan17 sentiment is what you're looking for not similarity
Haris Jabbar
i am trying to download/access the vocabulary used by BERT models in spacy. Just the list of 30k tokens. The 'Vocab.to_disk()' method just gives 1100 tokens. What am I doing wrong?
Hey spacy enthusiasts, is there an OR-operator for the matcher (except the IN-operator)? Or in other words: How can I include two words in an IN-operator? Example: I want to match also "two rabbits" in pattern = ({'LEMMA': {'IN': ["dog", "cat", "rat"]}} without creating a second pattern. Thanks
Hello everyone, I'm just looking for a way to custom the loss function in the text classification model: I'm doing BERT distillation, and would like to add the regression part in the loss function. Any idea what part I should rewrite or maybe use a custom component instead?
Zain Muhammad
Is there anyone who is having a prebuilt model for entity linking, because I dont have enough processing resources to train el model from training file+wikiKB..if yes please share with me.
Alessandro Piscopo
@spetulla_twitter I've tried with 312GB, limiting the training set to 1.5M entities, but after 4 days training and not much progress I stopped that because it was costly
It would be good to have an estimate of the time (like time by number of items in the training set) required to train an entity linking. Anybody has got anything like that?
Hello, How can we append our custom NER model into the standard NER Spacy Model? When I try to append it it actually gets overwritten.