Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 00:01
    github-actions[bot] locked #9136
  • 00:01
    github-actions[bot] commented #9136
  • Dec 02 21:13
    peter-exos commented #9782
  • Dec 02 20:24
    peter-exos commented #9782
  • Dec 02 19:33
    adrianeboyd labeled #9794
  • Dec 02 19:33
    adrianeboyd labeled #9794
  • Dec 02 17:39
    tomateit opened #9794
  • Dec 02 13:39
    adrianeboyd commented #9750
  • Dec 02 13:19
    adrianeboyd labeled #9782
  • Dec 02 13:19
    adrianeboyd labeled #9782
  • Dec 02 12:57
    adrianeboyd commented #9782
  • Dec 02 08:20
    adrianeboyd labeled #9792
  • Dec 02 08:20
    adrianeboyd labeled #9792
  • Dec 02 08:20
    adrianeboyd labeled #9792
  • Dec 02 08:20
    adrianeboyd labeled #9792
  • Dec 02 08:20
    adrianeboyd commented #9792
  • Dec 02 08:14
    dabasmoti edited #9792
  • Dec 02 08:13
    dabasmoti edited #9792
  • Dec 02 08:01
    dabasmoti opened #9792
  • Dec 02 00:36
    mbrunecky opened #9791
gayetr
@gayetr
Hi,I needed some help in creating Hindi POS tagging model using spacy.Has someone already worked on it.Please let me know! Thanks
Florian Schneider
@floschne

Hi everyone :)
I'm trying to use spaCy with GPU (CUDA 10.2) but I'm facing the following issue:

Traceback (most recent call last):                                                                                                                                                    [5/1106]
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap                                                                        
    self.run()                                                                                                                                                                                
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/multiprocessing/process.py", line 108, in run                                                                               
    self._target(*self._args, **self._kwargs)                                                                                                                                                 
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/spacy/language.py", line 1172, in _apply_pipes                                                                
    sender.send([doc.to_bytes() for doc in docs])
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/spacy/language.py", line 1172, in <listcomp>
    sender.send([doc.to_bytes() for doc in docs])
  File "nn_parser.pyx", line 249, in pipe
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/spacy/util.py", line 517, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "nn_parser.pyx", line 249, in pipe
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/spacy/util.py", line 517, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "pipes.pyx", line 405, in pipe
  File "pipes.pyx", line 417, in spacy.pipeline.pipes.Tagger.predict
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 167, in __call__
    return self.predict(x)
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/feed_forward.py", line 40, in predict
    X = layer(X)
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 167, in __call__
    return self.predict(x)
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 131, in predict
    y, _ = self.begin_update(X, drop=None)
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/feature_extracter.py", line 14, in begin_update
    features = [self._get_feats(doc) for doc in docs]
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/feature_extracter.py", line 14, in <listcomp>
    features = [self._get_feats(doc) for doc in docs]
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/thinc/neural/_classes/feature_extracter.py", line 22, in _get_feats
    return self.ops.asarray(arr, dtype="uint64")
  File "ops.pyx", line 1001, in thinc.neural.ops.CupyOps.asarray
  File "/srv/home/7schneid/miniconda3/envs/wikicaps/lib/python3.8/site-packages/cupy/_creation/from_data.py", line 41, in array
    return core.array(obj, dtype, copy, order, subok, ndmin)
  File "cupy/core/core.pyx", line 2004, in cupy.core.core.array
  File "cupy/core/core.pyx", line 2083, in cupy.core.core.array
  File "cupy/core/core.pyx", line 2157, in cupy.core.core._send_object_to_gpu
  File "cupy/core/core.pyx", line 138, in cupy.core.cor

Maybe it's because I'm also using a spaCy pipeline with multiple processes (n_processes is not 0 but set to an arbitrary 12)
for doc in self.spacy_nlp.pipe(self.raw_df['caption'].astype(str), batch_size=100, n_process=self.n_workers): ...... Does anybody know how to resolve this problem?

Florian Schneider
@floschne
Ok damn - it cannot be resolved at the moment: explosion/spaCy#5507
Florian Schneider
@floschne

I was able to resolve it by adding multiprocessing.set_start_method('spawn') in my main method as the first statement.
This causes the process to spawn instead of forking apart from the process that initialized CUDA. The processes then all use CUDA quite efficiently

gpustat output:
[7] GeForce RTX 2080 Ti | 38'C, 33 % | 9850 / 11019 MB | 7schneid(1037M) 7schneid(1519M) 7schneid(1541M) 7schneid(1491M) 7schneid(1405M) 7schneid(1355M) 7schneid(1485M)

gazakhova
@gazakhova

Hello everyone! :) I have one question (also on stackoverflow: https://stackoverflow.com/questions/65680548/spacy-ignored-essential-entities-during-labeling-how-important-is-this-asp).

I'm training a custom Spacy model. Does it affect the results whether I leave entities unlabeled? For example, Germany is LOC. In one example (1) I label this. In the other example (2) I ignore it and label another entity:

Example 1: Germany (LOC) is a country in Central and Western Europe.

Example 2: Germany borders Denmark (LOC) to the north.

It's not about one entity that is occasionally ignored, but about several.

Perin Dhrupal Shah
@perin789

Hi, I am annotating a document with custom entities for a custom NER model, I realized that some entities have a lot of occurrences in the document hence having that many annotations, but some entities only have a few occurrences and annotations. The result of this is that the custom entities with few occurrences are not being identified in the testing stage by the Custom NER.

Please, let me know if there is a way to solve this problem in spaCy? Would really appreciate the help. Thanks :)

Milan Mišić
@milan.misic_gitlab
Hi people
I have an issue using spacy text categorization, and cant find any similar issue on net
text categorization xlnet large model, 152 cats
torch.autograd.backward(y_for_bwd, grad_tensors=dy_for_bwd) line 126, in backward gradtensors = _make_grads(tensors, gradtensors) line 37, in _make_grads + str(out.shape) + ".") RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([5, 128, 768]) and output[0] has a shape of torch.Size([5, 416, 768])
if someone could give me some hint, maybe, I should override some part of spacy
change one = torch.Tensor([1]).float() to one = torch.tensor(1, dtype=torch.float)
or I am on totaly wrong trace
Perin Dhrupal Shah
@perin789
Hi. Can someone please let me know how we can use the Summarizer model in a spaCy pipeline?
gazakhova
@gazakhova

Hello everyone! :) I have one question (also on stackoverflow: https://stackoverflow.com/questions/65680548/spacy-ignored-essential-entities-during-labeling-how-important-is-this-asp).

I'm training a custom Spacy model. Does it affect the results whether I leave entities unlabeled? For example, Germany is LOC. In one example (1) I label this. In the other example (2) I ignore it and label another entity:

Example 1: Germany (LOC) is a country in Central and Western Europe.

Example 2: Germany borders Denmark (LOC) to the north.

It's not about one entity that is occasionally ignored, but about several.

If someone is interested, here is the answer: https://datascience.stackexchange.com/questions/88124/spacy-ignored-essential-entities-during-labeling-how-important-is-this-asp/88125#88125

Antonio Pinto
@byo-ai
Hi i need a programmer with experience in dart and flutter app that is willing to work for a few hours a week? DM me if you are interested
Antonio Pinto
@byo-ai
Byo.ai an intelligent assistant to make people carbon neutral/positive. Anyone with experience with one or more general purpose programming languages including but not limited to: Python, Java, C/C++ (also Pytorch,) feel free to send your CV to work@byo.ai (equity only) - passion for the environment, clean technologies and artificial intelligence is a plus!
Darshil Desai
@darshdee

Hi guys I have a quick question pertaining to the (semantic) similarity function performance in difference spacy versions

  • spacy==1.8.2 calculates the similarity score for a combination of 500 pairs of words within milliseconds. However performance in spacy==2.3.5 slows down to 2-3 seconds

Has someone experienced this, any idea as to why this may be?

faizansaeed116
@faizansaeed116
nlp.update()` was called with two positional arguments. This "
"may be due to a backwards-incompatible change to the format "
"of the training data in spaCy 3.0 onwards.
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
this the code
what should i do ?
OigreS
@SergioMarreroMarrero

Hello.

I already have some experience using the basics of spaCy. Now, I want to take a step ahead and starting with some more spaCy utilities, lot of them related to spacy version 3 (config, project, custom models, transformers, etc.) .

I already read/worked-with all the spacy usage documentation, but it is too much and too extense, I need resources more practical, with examples etc. I also took a look to the explosion github projects, but I don't know what is recommended to start with. On other hand, I browsed the web looking for some tutorials or good material, but all resources I found only cover the basics of spaCy.

For the next months I will develope some NLP projects that implies the use of custom models and probably transformers. I want to carry out these projects using spacy best practises, but I don't know where to start to adquire such abilities.

So then, do someone recommend me practical material to go with, advices, etc. in order to get profiecience?

Thanks in advance.

Niels Horn
@nilq
Hello, can anyone tell me what PronType is?
Alex Griggs
@Doginal

hey everyone, I have successfully trained my first NER model BUT ray isn't output anything after it completes, any ideas? running this command

python -m spacy ray train ./training-spacy/config-2.cfg --paths.train ./datasets/train.spacy --paths.dev ./datasets/eval.spacy --address X.X.X.X:6379 -o output --n-workers 5

python 3.8, spacy3

Alex Griggs
@Doginal
no errors with any logging level as well, how can i debug this?
Alex Griggs
@Doginal

Hey everyone, I figured out the issue, ray isn't creating the output folder, spacy train worked.

New question, does anyone have any resources on how to break down AI/machine learning projects for a team? I am trying to get a better grasp on this to help with timelines and budgets. Thanks

eyal
@eyalho

Hey, I get

>>> spacy.require_gpu()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
raise ValueError("GPU is not accessible. Was the library installed correctly?")
ValueError: GPU is not accessible. Was the library installed correctly?

But GPU does work with pytorch..

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.current_device()
0
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7f07506bd730>
>>> torch.cuda.device_count()
4
>>> torch.cuda.get_device_name(0)
'TITAN Xp'

Any idea why it doesn't work with spacy??

eyal
@eyalho

conda install -c conda-forge cupy

solved it

eyal
@eyalho

Another questions:

  1. When I try to train NER consists of transformers (by https://spacy.io/usage/training#quickstart) on small dataset consists 10K sentences it work amazing :D
    But when I try to train on a bigger with 700K sentences it fail with out of memory errors...
    How does it make sense??
    I tries to make the batch_size smaller but it did not help.

  2. Are there any formal paper describing the architectures suggested by ttps://spacy.io/usage/training#quickstart ?

Ann Mary Paul
@annmarypaul
Hi, everyone i trained my custom NER model in spacy 3,now i want to test its accuracy score on a new annotated dataset, the scorer for predicting accuracy doesn't seem to work well in spacy 3??
Alex Griggs
@Doginal
@annmarypaul have you tried https://spacy.io/usage/visualizers/?
nirandiw
@nirandiw
@svanschalkwyk Did you find out how we can use Spacy to mask text. Anything that came close to it was Presidio. What would be the steps if I want to use just spacy and build an anonymizer?
Rushabh Patel
@rushabh31
Hi, I am trying to run and get the results on .spacy data with my custom model. I am new to spacy and wondering if there is a easy way to do it.
jamesmcaul
@jamesmcaul
Hello, is there an easy way to implement stratified cross validation using the spacy CLI?
itsupera
@itsupera

Hello,
I want to generate a phonetic transcription from a sentence in japanese, and I'm trying to see how this can be done with scaCy.

The issue I'm facing is that tokenized chunks can't always be translated into phonemes.
For example, 一週間 (isshuukan) is tokenized into 一 (ichi) and 週間 (shuukan), so the transcription becomes "ichishuukan" instead of the correct "isshuukan".

Using spaCy's ja_core_news_md model, 一週間 is recognized as a named entity.
I could probably exploit this to rechunk the sentence, and get the correct phonemes.
Is this the correct approach to deal with this issue ?
Thank you in advance !

itsupera
@itsupera
EDIT: I've tried using PhraseMatcher with terms = ["一週間", "週間", "一"] (to simulate using a dictionary), but it matched 週間 instead of 一週間, not sure why
itsupera
@itsupera
EDIT2: After more research, found a tool called ichiran which solves this "segmentation problem" by assigning a score to each possible segment based on length (longer is better), how common the word is ... and find the combination of segments that has the best total score.
https://readevalprint.tumblr.com/post/97467849358/who-needs-graph-theory-anyway
Is this something that could be done through spaCy using the PhraseMatched ?
Ben Rockstroh
@brockstroh

I'm having trouble loading en_core_web_sm in an aws lambda. I have a layer created with spacy and en-core-web-sm as a package (pipenv below). When I attempt to load the model I am getting "[E893] Could not find function 'spacy.Tok2Vec.v1' in function registry 'architectures''. spacy_legacy is included in the layer. ```[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
spacy = "3.0.5"
requests = ""
requests-aws4auth = "
"
en-core-web-sm = {file = "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl"}

[dev-packages]

[requires]
python_version = "3.8"
```

Nazira Kaibalina
@NaziraKaibalina
Hello everyone. I'm trying to add support for a new language in spaCy as a personal project. Can anyone please advise how to do it?
Nazira Kaibalina
@NaziraKaibalina
Hello, when developing a language support package, do NER, POS tagging, dependency parsing tasks have to be trained on the same dataset?
gazakhova
@gazakhova

Hi. I am training custom Spacy NER models (v2x). I have come across an interesting problem. There are languages that are morphologically rich. For example, in German: Nominative: der Name, Accusative: den Namen, Genitive: des Namen. (P.s. This is just an example. I am not going to label an entity with its article. :) ) So the question is, if my model only sees "der Name" in examples, how should it be able to recognize "den Namen" or "des Namen"?

Lemmatization is unfortunately out of the question, because it modifies the indexes of start and end characters of entities. Another possibility is to label and train all possible forms of an entity. But then comes another problem: if I label "der Name" and "den Namen", then for Spacy model these are two different entities, right?

How do you deal with this problem? Or any ideas? (Here my question on stackexchange: https://datascience.stackexchange.com/questions/102072/spacy-ner-models-v2x-how-to-deal-with-inflections)

Pierre Snell
@Ierezell
Hi all, hope you're doing great. I have a hack to download spacy model from scripts using spacy.cli.download but it's not recognized by linters and all. I would like to know if there is a cleaner way to do it.
Steph van Schalkwyk
@svanschalkwyk
EntityRuler : I am scanning OCR text for (NER) terms. I need to include a matching method to take into account editing distance (spelling mistakes) and phonetic matching. Is there anything in SpaCy I could use? I have fine tuned a BERT model on these NER entities as well, but due to the smallish corpus, I am getting too many false positives from my model.
Blake List
@BlakeList

Hi there, apologies if this has already been posted, but I am new to this channel. Does anyone have an idea of how to implement entity linking for ontologies from http://www.obofoundry.org/ (i.e. in .owl or .obo form)? I have a custom named-entity recognition model trained with Spacy projects that features entities extracted from some of these ontologies. I would like to anchor those entities to their respective ids and train an entity linker component. I also have Prodigy for annotation.

I am a little stuck on how to actually get data from an ontology into the required form for a Spacy knowledgebase? Any ideas? Thank you in advance.

sreejapk
@sreejapk
Hi, Hope everyone is safe and good. I am facing an issue while loading english _lg model. Trying from Django project, with celery. When I load the model during the restrt, It was giving an error like this (shown down). But if I add the model loading in the API call, it was working fine.

Hi, Hope everyone is safe and good. I am facing an issue while loading english _lg model. Trying from Django project, with celery. When I load the model during the restrt, It was giving an error like this (shown down). But if I add the model loading in the API call, it was working fine.

File "/home/sreeja/app/matching/urls.py", line 3, in <module>
from matching import views
File "/home/sreeja/app/matching/views.py", line 36, in <module>
nlp = spacy.load("en_core_web_lg")
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/init.py", line 52, in load
name, vocab=vocab, disable=disable, exclude=exclude, config=config
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/util.py", line 347, in load_model
return load_model_from_package(name, kwargs)
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/util.py", line 380, in load_model_from_package
return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config)
File "/home/sreeja/.local/lib/python3.7/site-packages/en_core_web_lg/init.py", line 10, in load
return load_model_from_init_py(file,
overrides)
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/util.py", line 546, in load_model_from_init_py
config=config,
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/util.py", line 416, in load_model_from_path
return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/language.py", line 1966, in from_disk
util.from_disk(path, deserializers, exclude)
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/util.py", line 1225, in from_disk
reader(path / key)
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/language.py", line 1942, in deserialize_vocab
self.vocab.from_disk(path, exclude=exclude)
File "spacy/vocab.pyx", line 484, in spacy.vocab.Vocab.from_disk
File "spacy/vectors.pyx", line 431, in spacy.vectors.Vectors.from_disk
File "/home/sreeja/.local/lib/python3.7/site-packages/spacy/util.py", line 1225, in from_disk
reader(path / key)
File "spacy/vectors.pyx", line 423, in spacy.vectors.Vectors.from_disk.load_vectors
File "/home/sreeja/.local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
pickle_kwargs=pickle_kwargs)
File "/home/sreeja/.local/lib/python3.7/site-packages/numpy/lib/format.py", line 787, in read_array
array.shape = shape
ValueError: cannot reshape array of size 67577852 into shape (684830,300)

viv
@vistamou
hello everyone, I want to train a model, based on conllu files, for a new language (not included in the list of the already existing languages) and also there's no treebank available for it, do you have any idea if that's possible?
jban-x3
@jban-x3
hi, anyone already installed spacy with gpu support on the new macbook pro?
Allan Campopiano
@Alcampopiano
Hello everyone! Is it possible to point me to the docs that show how to exclude certain punctuation marks? I am processing text with emojis in the form of ":grinning_face:", therefore, I want to keep the colons.