by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 07:47
    adrianeboyd labeled #5885
  • 07:47
    adrianeboyd labeled #5885
  • 07:47
    adrianeboyd opened #5885
  • 07:34
    adrianeboyd commented #5884
  • 07:34
    adrianeboyd labeled #5884
  • 07:17
    adrianeboyd synchronize #5848
  • 07:14
    adrianeboyd commented #5879
  • 03:32
    arditoibryan edited #5884
  • 03:31
    arditoibryan edited #5884
  • 02:01
    arditoibryan edited #5884
  • 02:01
    arditoibryan opened #5884
  • 00:11
    github-actions[bot] closed #5825
  • 00:11
    github-actions[bot] commented #5825
  • 00:11
    github-actions[bot] closed #5832
  • 00:11
    github-actions[bot] commented #5832
  • Aug 05 23:23

    ines on develop

    Tidy up docs components [ci ski… (compare)

  • Aug 05 22:52

    ines on nightly.spacy.io

    Use "raise ... from" in custom … Sync develop with nightly docs … Merge pull request #5882 from e… and 2 more (compare)

  • Aug 05 22:52

    ines on develop

    Fix server-side rendering [ci s… (compare)

  • Aug 05 22:40
    honnibal commented #5879
  • Aug 05 22:39
    honnibal synchronize #5879
James
@T4rqu1n_twitter
Have you tried changing models? I have no idea, but it sounds as if the word vectors aren't present in this one?
I'm a monster noob though, so probably no idea what I'm on about
Greg Werner
@demongolem
The word vectors are in the lg model for sure and that is in my code to where I am saving the doc object. It is the sm model which lacks them. However, upon reading from disk, I don't know if there is something explicitly I have to do to restore the word vectors (that is if the word vectors were really stored to disk in the first place)
James
@T4rqu1n_twitter
Sorry I can't be of any help :(
I haven't had any response to my question either, so communication in this channel is a bit hit and miss
Greg Werner
@demongolem
Persisting both doc and doc.vocab in the above manner worked for me.
UTKARSH VARDHAN
@u6yuvi_twitter
I trained a custom ner model with 20 labels. However when I am doing inference I am getting one odd tag (O-) for one particular word "Progress",however this not a part of my label tags. I wondered is this because of the out of vocabulary word but in vocabulary I could find the word progress. What am I missing here?
ravishpankar
@ravishpankar
Hello, I need to understand the logic behind unseen entity recognition in spacy using a custom ner model trained with some job descriptions. Please somebody help.
ravishpankar
@ravishpankar
I have some labels for job skills. I don't know how many jds I need to train for IT domain. I need a guarantee that if I train 200 Jds for different job titles will the predictions be almost accurate. I have trained my model with 46 jds and prediction seems to be, in a crude sense, 70 percent accurate. Somebody please help.
salvatore
@erotavlas
@ravishpankar that seems like too few examples to train on if you mean 46 annotations
salvatore
@erotavlas
Hello, when experimenting with the output model from spacy pretrain command, I found that whenever I use the --init-tok2vec to train a new model along with the word vectors it always results in a model with higher recall but lower precision. When I train without the --init-tok2vec (but just the word2vec vectors), the model scores seem more balanced (and better precision) For the amount of time it takes to train the pretrained model it seems that the effort doesn't pay off that much and results in degraded performance (for my case more false positives) Any thoughts on this??
ravishpankar
@ravishpankar
@erotavlas, thanks. I'm annotating 20 sentences per job description
barataplastica
@barataplastica

when I'm installing the most recent version of spacy and loading 'en_core_web_lg' I'm getting a vocab size of 489

import spacy
nlp = spacy.load('en_core_web_md')
len(nlp.vocab)
489

When I do the same for version 2.2.4 I get

import spacy
nlp = spacy.load('en_core_web_lg')
len(nlp.vocab)
1340241

Am I doing something wrong with the newest version???

3 replies
Jakub Richtarik
@richtarik.jakub_gitlab
Hi there, I need help with spacy's noun_chunks, coulds somebody help me?
martijnvanbeers
@martijnvanbeers
richtarik.jakub_gitlab: that's hard to know without hearing your actual question, isn't it? :)
Jakub Richtarik
@richtarik.jakub_gitlab

I have text "galaxy in the constellation Cetus"
when i run code:

nlp = spacy.load("en_core_web_sm")
doc = nlp("galaxy in the constellation Cetus")
for chunk in doc.nounchunks:
print(chunk.text+""+root.text+""+root.dep
+"__"+root.head.text)

it returns:
the constellationconstellationpobj__in
--- [the, Cetus]
galaxy in the constellation Cetus
[]

but when i ran it on spacy's page online, it return that galaxy is the ROOT
sorry it removed my "__" after chunk.text and after root.text
martijnvanbeers
@martijnvanbeers
maybe the online demo is using one of the larger models?
Jakub Richtarik
@richtarik.jakub_gitlab
it uses also en_core_web_sm ....the only difference is that i have 2.3.1 version and their is 2.3.0
martijnvanbeers
@martijnvanbeers
when I run the code you pasted it complains that root isn't defined, so you didn't show full code
Jakub Richtarik
@richtarik.jakub_gitlab
nlp = spacy.load("en_core_web_sm")
doc = nlp("galaxy in the constellation Cetus")
for chunk in doc.nounchunks:
print(chunk.text, chunk.root.text, chunk.root.dep
,
chunk.root.head.text)
..... of course + import spacy on the top
martijnvanbeers
@martijnvanbeers
and what is it that you want to achieve? with a statistical model like spacy, you will never get the 'correct' result for every text you feed it
Jakub Richtarik
@richtarik.jakub_gitlab
so the question is...why online version finds galaxy as root, but mine not
i want to extract definition words from sentences.... and noun_chunks help a lot
or keywords
....that sounds better
martijnvanbeers
@martijnvanbeers
for me on 2.1.8, it finds chunks "galaxy", "the constellation" and "Cetus". since you only pasted output for "the constellation" I'm going to assume you're talking about that one
so you want root.head.text to return galaxy there?
Jakub Richtarik
@richtarik.jakub_gitlab
yes, i need right this
martijnvanbeers
@martijnvanbeers
where's this online demo?
Jakub Richtarik
@richtarik.jakub_gitlab
you can paste a code there and run it
martijnvanbeers
@martijnvanbeers
I mean the exact url
oh, you modified the code in that example?
Jakub Richtarik
@richtarik.jakub_gitlab
yes, i modified it]
Jakub Richtarik
@richtarik.jakub_gitlab
it looks like the nevest version doesn't work correctly always
martijnvanbeers
@martijnvanbeers
if you always want the ROOT of the sentence, you have to iterate through the dependency tree till you reach it, not just get the nearest token.head
Jakub Richtarik
@richtarik.jakub_gitlab
hmm i see.... i hope i understand it now, thanks a lot maaan
ravishpankar
@ravishpankar
Can I train spacy dep parser model with the Stanford dataset?
I need a dep parser that works like Stanford NLP dep parser. Spacy, allenai doesn't perform well on text like "creates test cases", "executes the deployment", ... Sentences that Don't have subject. Anybody, please help.
Asbjørn Heid
@aheid_gitlab
Hi, nlp n00b here (yay)... say I wanted to make a voice-driven calculator, and I have the speech-to-text working, would spacy be a good fit for processing the text, extracting numbers and operators? I see it can classify stuff with like_num, but from what I can see I'd have to combine multiple "like_num" tokens to a single number myself somehow, correct?
michael-w-darling
@michael-w-darling
Hello everyone! Also a NLP noob here but reading over SpaCy’s documentation has me super excited to learn more about it.
I’m able to use the NLP pipelines for all I need but having a difficult time exporting the tokens, entities etc for presenting findings on customer surveys. For example if I want to keep the customer ID, sentiment and the key topics, entities to be able to link those topics to a customer or survey what would be the best use or reference material?
ym-han
@ym-han
@ravishpankar you might want to look into spacy-stanza
Edmond Varga
@vedtam
Hello guys! I'm wondering if there is a way to extract somehow geographic region names of nationalities like American to get America, using Spacy. So far, I've tried to experiment using lemmatisation but the lemmatised form is still American. :(
Ryan H. Lewis
@rhl-
Hey everyone, i'm following: explosion/spaCy#4486, we are exploring high memory usage with spacy, and I was wondering if anyone knows where this valgrind tutorial lives?
I've used valgrind many times so im not anticipating any real issues but, i just would love to see the notes.