Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 01:37
    dpalmasan edited #7754
  • 01:37
    dpalmasan edited #7754
  • Apr 12 22:40
    e-nesse commented #7320
  • Apr 12 22:34
    e-nesse commented #7320
  • Apr 12 22:33
    e-nesse commented #7320
  • Apr 12 18:19
    lingvisa commented #7687
  • Apr 12 17:59
    adrianeboyd commented #7687
  • Apr 12 17:43
    lingvisa commented #7687
  • Apr 12 17:42
    lingvisa commented #7687
  • Apr 12 17:40
    lingvisa commented #7687
  • Apr 12 16:27
    svlandeg locked #7752
  • Apr 12 16:27
    svlandeg closed #7752
  • Apr 12 16:26
    svlandeg commented #7752
  • Apr 12 16:24
    svlandeg labeled #7754
  • Apr 12 16:11
    svlandeg labeled #7755
  • Apr 12 16:11
    svlandeg labeled #7755
  • Apr 12 16:11
    svlandeg labeled #7755
  • Apr 12 16:10
    svlandeg opened #7755
  • Apr 12 16:08
    adrianeboyd commented #7674
  • Apr 12 15:14
    kinghuang commented #7736
syed yasir shah
@yasirprince423_twitter
i need to install spacy but here is showing a error
i am using python 3.8
Ankit Phaterpekar
@phaterpekar
anyone using Spacy 3.0 for NER tasks in a production environment? Wondering if there are any major issues to avoid using Spacy 3.0 in production just yet
Philipp Sodmann
@p-sodmann
Hi guys, I wrote a small Tutorial how to train a text Categorizer in SpaCy V3. The new API is awesome when it comes to generating training and test data:
https://medium.com/@psodmann/building-a-text-classifier-with-spacy-3-0-dd16e9979a
Inayat-Khan
@Inayat-Khan
Hi All, Greetings to all. My first question to the community, does anyone have experience successfully running the Spacy Matcher in a Notebook on Databricks in Azure. I am getting an error saying jsonschema is not installed, however the cluster has it and also I manually successfully installed it using %sh pip install jsonschema and still the error. Thanks in advance.
ValueError: [E136] This additional feature requires the jsonschema library to be installed:
pip install jsonschema
gazakhova
@gazakhova
Hello I am training a NER model. For each entity I have about 200 labeled data. Every time I run the script (train the data), I get different results. Sometimes f-score is pretty good, but most of the time it's not. What could be the reason? Thanks in advance.
Arunmozhi
@tecoholic
Hi everyone, I created yet another web UI to annotate data for SpaCy NER training after running into index bugs on the existing one. Take a look https://github.com/tecoholic/ner-annotator
Arunmozhi
@tecoholic
sample
A sample where we are tagging covid reports to extract case information from govt bulletins.
Kindly provide feedback and open issues if you think the project is useful and can be improved further. Thank you.
Srijha09
@Srijha09
Hi, I am getting an error "FileNotFoundError: [Errno 2] No such file or directory:\vocab\lexemes.bin" every time I load a custom model in spacy
Spacy version: 2.3.1
Driss Guessous
@drisspg
I am training a textcat component "simple-cnn" architecture on an amazon sagemaker instance with a Tesla k80 (p2.xlarge). I expected to see a fairly large decrease in training time however I am only seeing around a 25% reduction compared to my 2020 13inch MPB (non m1) with on graphics card. Is that inline with what other people have seen. My training loop is nearly identical to the one on spacys website
Anique
@aniquetahir
anyone had luck with NER on tweets?
gayetr
@gayetr
Is anyone aware of any module in spacy that can pluralize singular noun and provide inflections esp. for English and French?Please let me know if there are any suggestions? Thanks!
felipemautner
@felipemautner
Hello everyone! I'm working on an NER project and want to train the model with my own corpora, can I use the coNLL file format to do so?
vikasmech
@vikasmech
@drisspg spacy training for simple-cnn doesn't get that much advantage by GPU.
Mitra Mirshafiee
@mitramir55
Hello everyone! 👋 One quick question.
Is there any way by which I can make Spacy rule-based matcher faster? Do you think there are any other models that can find specific rules (I want to search for passive voice) faster than this package?
gayetr
@gayetr
Hi,I needed some help in creating Hindi POS tagging model using spacy.Has someone already worked on it.Please let me know! Thanks
Florian Schneider
@floschne

Hi everyone :)
I'm trying to use spaCy with GPU (CUDA 10.2) but I'm facing the following issue:

Traceback (most recent call last):                                                                                                                                                    [5/1106]
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap                                                                        
    self.run()                                                                                                                                                                                
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/multiprocessing/process.py", line 108, in run                                                                               
    self._target(*self._args, **self._kwargs)                                                                                                                                                 
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/spacy/language.py", line 1172, in _apply_pipes                                                                
    sender.send([doc.to_bytes() for doc in docs])
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/spacy/language.py", line 1172, in <listcomp>
    sender.send([doc.to_bytes() for doc in docs])
  File "nn_parser.pyx", line 249, in pipe
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/spacy/util.py", line 517, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "nn_parser.pyx", line 249, in pipe
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/spacy/util.py", line 517, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "pipes.pyx", line 405, in pipe
  File "pipes.pyx", line 417, in spacy.pipeline.pipes.Tagger.predict
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 167, in __call__
    return self.predict(x)
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/feed_forward.py", line 40, in predict
    X = layer(X)
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 167, in __call__
    return self.predict(x)
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 131, in predict
    y, _ = self.begin_update(X, drop=None)
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/feature_extracter.py", line 14, in begin_update
    features = [self._get_feats(doc) for doc in docs]
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/feature_extracter.py", line 14, in <listcomp>
    features = [self._get_feats(doc) for doc in docs]
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/feature_extracter.py", line 22, in _get_feats
    return self.ops.asarray(arr, dtype="uint64")
  File "ops.pyx", line 1001, in thinc.neural.ops.CupyOps.asarray
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/cupy/_creation/from_data.py", line 41, in array
    return core.array(obj, dtype, copy, order, subok, ndmin)
  File "cupy/core/core.pyx", line 2004, in cupy.core.core.array
  File "cupy/core/core.pyx", line 2083, in cupy.core.core.array
  File "cupy/core/core.pyx", line 2157, in cupy.core.core._send_object_to_gpu
  File "cupy/core/core.pyx", line 138, in cupy.core.cor

Maybe it's because I'm also using a spaCy pipeline with multiple processes (n_processes is not 0 but set to an arbitrary 12)
for doc in self.spacy_nlp.pipe(self.raw_df['caption'].astype(str), batch_size=100, n_process=self.n_workers): ...... Does anybody know how to resolve this problem?

Florian Schneider
@floschne
Ok damn - it cannot be resolved at the moment: explosion/spaCy#5507
Florian Schneider
@floschne

I was able to resolve it by adding multiprocessing.set_start_method('spawn') in my main method as the first statement.
This causes the process to spawn instead of forking apart from the process that initialized CUDA. The processes then all use CUDA quite efficiently

gpustat output:
[7] GeForce RTX 2080 Ti | 38'C, 33 % | 9850 / 11019 MB | 7schneid(1037M) 7schneid(1519M) 7schneid(1541M) 7schneid(1491M) 7schneid(1405M) 7schneid(1355M) 7schneid(1485M)

gazakhova
@gazakhova

Hello everyone! :) I have one question (also on stackoverflow: https://stackoverflow.com/questions/65680548/spacy-ignored-essential-entities-during-labeling-how-important-is-this-asp).

I'm training a custom Spacy model. Does it affect the results whether I leave entities unlabeled? For example, Germany is LOC. In one example (1) I label this. In the other example (2) I ignore it and label another entity:

Example 1: Germany (LOC) is a country in Central and Western Europe.

Example 2: Germany borders Denmark (LOC) to the north.

It's not about one entity that is occasionally ignored, but about several.

Perin Dhrupal Shah
@perin789

Hi, I am annotating a document with custom entities for a custom NER model, I realized that some entities have a lot of occurrences in the document hence having that many annotations, but some entities only have a few occurrences and annotations. The result of this is that the custom entities with few occurrences are not being identified in the testing stage by the Custom NER.

Please, let me know if there is a way to solve this problem in spaCy? Would really appreciate the help. Thanks :)

Milan Mišić
@milan.misic_gitlab
Hi people
I have an issue using spacy text categorization, and cant find any similar issue on net
text categorization xlnet large model, 152 cats
torch.autograd.backward(y_for_bwd, grad_tensors=dy_for_bwd) line 126, in backward gradtensors = _make_grads(tensors, gradtensors) line 37, in _make_grads + str(out.shape) + ".") RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([5, 128, 768]) and output[0] has a shape of torch.Size([5, 416, 768])
if someone could give me some hint, maybe, I should override some part of spacy
change one = torch.Tensor([1]).float() to one = torch.tensor(1, dtype=torch.float)
or I am on totaly wrong trace
Perin Dhrupal Shah
@perin789
Hi. Can someone please let me know how we can use the Summarizer model in a spaCy pipeline?
gazakhova
@gazakhova

Hello everyone! :) I have one question (also on stackoverflow: https://stackoverflow.com/questions/65680548/spacy-ignored-essential-entities-during-labeling-how-important-is-this-asp).

I'm training a custom Spacy model. Does it affect the results whether I leave entities unlabeled? For example, Germany is LOC. In one example (1) I label this. In the other example (2) I ignore it and label another entity:

Example 1: Germany (LOC) is a country in Central and Western Europe.

Example 2: Germany borders Denmark (LOC) to the north.

It's not about one entity that is occasionally ignored, but about several.

If someone is interested, here is the answer: https://datascience.stackexchange.com/questions/88124/spacy-ignored-essential-entities-during-labeling-how-important-is-this-asp/88125#88125

Antonio Pinto
@byo-ai
Hi i need a programmer with experience in dart and flutter app that is willing to work for a few hours a week? DM me if you are interested
Antonio Pinto
@byo-ai
Byo.ai an intelligent assistant to make people carbon neutral/positive. Anyone with experience with one or more general purpose programming languages including but not limited to: Python, Java, C/C++ (also Pytorch,) feel free to send your CV to work@byo.ai (equity only) - passion for the environment, clean technologies and artificial intelligence is a plus!
Darshil Desai
@darshdee

Hi guys I have a quick question pertaining to the (semantic) similarity function performance in difference spacy versions

  • spacy==1.8.2 calculates the similarity score for a combination of 500 pairs of words within milliseconds. However performance in spacy==2.3.5 slows down to 2-3 seconds

Has someone experienced this, any idea as to why this may be?

faizansaeed116
@faizansaeed116
nlp.update()` was called with two positional arguments. This "
"may be due to a backwards-incompatible change to the format "
"of the training data in spaCy 3.0 onwards.
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
this the code
what should i do ?
OigreS
@SergioMarreroMarrero

Hello.

I already have some experience using the basics of spaCy. Now, I want to take a step ahead and starting with some more spaCy utilities, lot of them related to spacy version 3 (config, project, custom models, transformers, etc.) .

I already read/worked-with all the spacy usage documentation, but it is too much and too extense, I need resources more practical, with examples etc. I also took a look to the explosion github projects, but I don't know what is recommended to start with. On other hand, I browsed the web looking for some tutorials or good material, but all resources I found only cover the basics of spaCy.

For the next months I will develope some NLP projects that implies the use of custom models and probably transformers. I want to carry out these projects using spacy best practises, but I don't know where to start to adquire such abilities.

So then, do someone recommend me practical material to go with, advices, etc. in order to get profiecience?

Thanks in advance.

Niels Horn
@nilq
Hello, can anyone tell me what PronType is?
Alex Griggs
@Doginal

hey everyone, I have successfully trained my first NER model BUT ray isn't output anything after it completes, any ideas? running this command

python -m spacy ray train ./training-spacy/config-2.cfg --paths.train ./datasets/train.spacy --paths.dev ./datasets/eval.spacy --address X.X.X.X:6379 -o output --n-workers 5

python 3.8, spacy3

Alex Griggs
@Doginal
no errors with any logging level as well, how can i debug this?
Alex Griggs
@Doginal

Hey everyone, I figured out the issue, ray isn't creating the output folder, spacy train worked.

New question, does anyone have any resources on how to break down AI/machine learning projects for a team? I am trying to get a better grasp on this to help with timelines and budgets. Thanks

eyal
@eyalho

Hey, I get

>>> spacy.require_gpu()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
raise ValueError("GPU is not accessible. Was the library installed correctly?")
ValueError: GPU is not accessible. Was the library installed correctly?

But GPU does work with pytorch..

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.current_device()
0
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7f07506bd730>
>>> torch.cuda.device_count()
4
>>> torch.cuda.get_device_name(0)
'TITAN Xp'

Any idea why it doesn't work with spacy??

eyal
@eyalho

conda install -c conda-forge cupy

solved it

eyal
@eyalho

Another questions:

  1. When I try to train NER consists of transformers (by https://spacy.io/usage/training#quickstart) on small dataset consists 10K sentences it work amazing :D
    But when I try to train on a bigger with 700K sentences it fail with out of memory errors...
    How does it make sense??
    I tries to make the batch_size smaller but it did not help.

  2. Are there any formal paper describing the architectures suggested by ttps://spacy.io/usage/training#quickstart ?

Ann Mary Paul
@annmarypaul
Hi, everyone i trained my custom NER model in spacy 3,now i want to test its accuracy score on a new annotated dataset, the scorer for predicting accuracy doesn't seem to work well in spacy 3??