Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
  • 19:45
    santoshbs edited #6510
  • 19:44
    santoshbs opened #6510
  • 16:47
    svlandeg commented #6508
  • 16:46
    svlandeg labeled #6508
  • 16:46
    svlandeg labeled #6508
  • 15:41
    adrianeboyd synchronize #6509
  • 15:36
    adrianeboyd synchronize #6509
  • 15:03
    adrianeboyd synchronize #6509
  • 14:52
    adrianeboyd synchronize #6509
  • 14:15
    adrianeboyd synchronize #6509
  • 14:06
    adrianeboyd synchronize #6509
  • 13:59
    adrianeboyd synchronize #6509
  • 13:56
    adrianeboyd synchronize #6509
  • 13:49
    adrianeboyd labeled #6509
  • 13:49
    adrianeboyd labeled #6509
  • 13:49
    adrianeboyd opened #6509
  • 06:54
    lfoppiano opened #6508
  • 00:14
    github-actions[bot] closed #6436
  • 00:14
    github-actions[bot] commented #6436
  • 00:14
    github-actions[bot] closed #6459
Prasad Varade
What should be acceptable loss while training custom NER ?
Hi, I was wondering if there is a way to get the string corresponding to a hash value if the string itself is not stored in the StringStore?
Hi, I am trying to use NLP techniques to extract the calculation logics/rules from the texts in specifications/regulations. For example, given the measurement rules descriptions, I can get the calculation logics to make it understandable to computers. Are there some good ways in SpaCy to achieve it? alt
Ario K
is it possible to have spacy load a blank model in the spacy.load() function?
instead of using spacy.blank?
also is there a default model in spacy that i don't need to download?
Jakub Richtarik
Hi, is there any way to iterate through noun_chunks in czech language? I know that czech language isn't supported in current version... so I tried to install spacy_udpipe, which supports udpipe models (also czech) with spacy's functionality, but this iteration isn't working. Is it because there is no lang/cs/syntax_iterators.py? Should I create one? Thanks for any advice.
Hi, is it mandatory that we use the UD tagset when POS tagging with Spacy? Or can we use an altered or custom tagset?
aanifh: you mean for a language model you want to become an official one? you can do whatever you like with models you develop yourself of course
@martijnvanbeers yes in theory, the (unaltered) UD tagsets isn't suitable for the language I'm working on, too Eurocentric
aanifh: I'm not part of the spacy team in any way, but my guess is that for official models you're going to need a really good reason why. it's easier if all the models behave the same as much as possible
I think your best bet is to open an issue asking, and explaining really well why UD isn't sufficient for your language, what things are missing, and how the tagset you propose instead is different from UD, and how much overlap there is
Sam Hoffman
hello! Is there a language-agnostic way to do type checking for spacy models? Ideally, I'd like to be able to do something like call isinstance(nlp, spacy.lang). Unfortunately, when I do that now, I get an error like TypeError: isinstance() arg 2 must be a type or tuple of types
on the other hand, isinstance(mod, spacy.lang.en.English) returns True, but ideally I could do this test without reference to a specific language

Hello together,

I have build a spacy pipeline for binary text classification. The pipeline works fine for models that are available through the spacy library. In order to compare my existing results to other models (https://github.com/google-research/bert#pre-trained-models) I used the convert_bert_original_tf_checkpoint_to_pytorch.py script (https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py) to convert existing checkpoints to PyTorch models. After that I wanted to "load those pyTorch models from a path" (https://github.com/explosion/spacy-transformers) to my pipeline.

I am able to successfully load those pyTorch models to my pipeline, but when I start the training with the same training data, I get the error message:
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 _no_grad_embeddingrenorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

IndexError: index out of range in self

I honestly do no understand this error and how I can solve it. After researching this problem, I tried to adjust my config file in different manners - without success. My only track at the moment is, if I reduce the my input size below 200 words, it is working fine. I would like to compare those models with the same inputa data (with the limitation of the 512 token by BERT) without truncation.

Does someone has an idea or a clue, how I could fix that problem? Any idea, would be highly appreciated!

Thanks a lot in advance!

Hi all, 1) Do we know what is the number of a parameter used in NER 2.1.8 version? 2) what is the max number of document after which adding data in training wont add value to the model (NER)?
C Swart
Screenshot 2020-10-15 at 14.09.01.png
Hello does spacy NER use word vectors? So the base spacy NER input is a concat of glove vectors and other hash embeddings? Img src: https://github.com/explosion/talks/blob/master/2017-11-02_Practical-and-Effective-Neural-NER.pdf
Jack Rory Staunton
anyone using spacy 3? is there a way to use spacy-transformers with actual hugging face transformer models other than en_core_web_trf? if so, the documentation is not clear on how...
1 reply
Hi, in a nutshell: How can we create the proper training format for our own corpus and additionally we want to add custom entities to train our own custom (german) ner model. We are trying to train a custom NER Model on our own Data. We found and ran the example Link. The training data seems to be in the following json format: Link. We have our own corpus with custom entities. We want to bring our corpus into the necessary format. The goldparser class is gone if we understood it correctly. Using spacy.training we found docs_to_json. It seems promising but we cant manage to add our custom entities to it. Anyone can help us out here? we are quit desperate :D
2 replies
Hi there! I am currently using the spacy train command to train a custom NER model. For my use case, entity-based evaluation is not relevant, I'd prefer to do token-based evaluation. Is there an easy way to use a custom Scorer with the command line, or I need to write a script for it? Many thanks!
Who has data sets for FA Cup Final 2012, Super Tuesday 2012, and the US Elections 2012?
I am new to NLP and using spacy for the first time for languages other than English.Can someone please help me with some examples as how to go about building applications with supported languages and no unavailable models? I am trying to use the basic codes available in Hindi for tokenization and lemmatization? Any answer will be appreciated!
Hi! May I know how to define the matcher pattern of "NNP/NN NNP/NN", which means it can be "NNP NNP", "NNP NN", "NN NNP", or "NN NN". Thank you!
Hi there! Does anyone know if it is possible to generate all possible parse trees for a given sentence in Spacy?
I have a custom vector of shape (300,) and I'm trying to obtain its word equivalent or most similar. Is it possible with spacy?
Shahid Khan
Hello awesome peoples of spacy do anybody have a docker image for spacy nightly with transformer language model
Shahid Khan
i am getting an error
Can't find table(s) lexeme_norm for language 'en' in spacy-lookups-data
when training on google colab
anybody faced this issue before
Modupalli Sreekanth
can anyone explain how to match whitespaces in spaCy?
Antonio Pinto
Hi There! Anyone with experience in FAIR's blender model and spacy/rasa?
hello everyone, is anyone having importing issues with spacy today?
ImportError: cannot import name 'prefer_gpu' from 'thinc.api'
I've tried updating spicy and thinc and I'am still getting this error
Antonio Pinto
syed yasir shah
i need to install spacy but here is showing a error
i am using python 3.8
Ankit Phaterpekar
anyone using Spacy 3.0 for NER tasks in a production environment? Wondering if there are any major issues to avoid using Spacy 3.0 in production just yet
Philipp Sodmann
Hi guys, I wrote a small Tutorial how to train a text Categorizer in SpaCy V3. The new API is awesome when it comes to generating training and test data:
Hi All, Greetings to all. My first question to the community, does anyone have experience successfully running the Spacy Matcher in a Notebook on Databricks in Azure. I am getting an error saying jsonschema is not installed, however the cluster has it and also I manually successfully installed it using %sh pip install jsonschema and still the error. Thanks in advance.
ValueError: [E136] This additional feature requires the jsonschema library to be installed:
pip install jsonschema
Hello I am training a NER model. For each entity I have about 200 labeled data. Every time I run the script (train the data), I get different results. Sometimes f-score is pretty good, but most of the time it's not. What could be the reason? Thanks in advance.
Hi everyone, I created yet another web UI to annotate data for SpaCy NER training after running into index bugs on the existing one. Take a look https://github.com/tecoholic/ner-annotator
A sample where we are tagging covid reports to extract case information from govt bulletins.
Kindly provide feedback and open issues if you think the project is useful and can be improved further. Thank you.
Hi, I am getting an error "FileNotFoundError: [Errno 2] No such file or directory:\vocab\lexemes.bin" every time I load a custom model in spacy
Spacy version: 2.3.1
Driss Guessous
I am training a textcat component "simple-cnn" architecture on an amazon sagemaker instance with a Tesla k80 (p2.xlarge). I expected to see a fairly large decrease in training time however I am only seeing around a 25% reduction compared to my 2020 13inch MPB (non m1) with on graphics card. Is that inline with what other people have seen. My training loop is nearly identical to the one on spacys website