Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    allnulled
    @allnulled
    Hello
    Rachna Chakraborty
    @rachnachakraborty
    hi there
    RYeah Sh
    @pantchox
    Hello, great jon on the suite of libs!
    wanted to know if there is a possibility that the tokenizer will output VB/NN/XX instead of just "word"
    Rachna Chakraborty
    @rachnachakraborty
    @pantchox Thank you for using wink packages.
    Rachna Chakraborty
    @rachnachakraborty
    The tokenizer is splitting the given text into valid tokens and publishing the token type as an output. For pos tags you will have to use wink-pos-tagger. The output will be in this format: [ { value: 'He', tag: 'word', normal: 'he', pos: 'PRP' }, // { value: 'is', tag: 'word', normal: 'is', pos: 'VBZ', lemma: 'be' }, // { value: 'trying', tag: 'word', normal: 'trying', pos: 'VBG', lemma: 'try' }, // { value: 'to', tag: 'word', normal: 'to', pos: 'TO' }, // { value: 'fish', tag: 'word', normal: 'fish', pos: 'VB', lemma: 'fish' }
    Rachna Chakraborty
    @rachnachakraborty
    Here is a better formatted output:
    [
      { value: 'He', tag: 'word', normal: 'he', pos: 'PRP' }, 
      { value: 'is', tag: 'word', normal: 'is', pos: 'VBZ', lemma: 'be' }, 
      { value: 'trying', tag: 'word', normal: 'trying', pos: 'VBG', lemma: 'try' },
      { value: 'to', tag: 'word', normal: 'to', pos: 'TO' }, 
      { value: 'fish', tag: 'word', normal: 'fish', pos: 'VB', lemma: 'fish' }
    ]
    you can easily .map this array to any format that you may require.
    RYeah Sh
    @pantchox
    thanks
    thecodingcrow
    @thecodingcrow
    I got a question about the wink-ner package, is there a specific format needed for the training data? in the example were "text" and "entityType" used, is this relevant?
    Prateek Saxena
    @prtksxna
    Hey @thecodingcrow 👋🏽
    So the training data needs to be an array of objects where the properties text and entityType are required
    You can optionally add more properties like uid and value that'll be returned as is if that entity is found
    @thecodingcrow, you can read more about this here — https://winkjs.org/wink-ner/NER.html#learn (see the Parameters section after the code example)
    Prateek Saxena
    @prtksxna
    Hope that helps! Let me know if you run into anything else 😊
    thecodingcrow
    @thecodingcrow
    Hey @prtksxna :) thanks for your fast response and nice explanation. I already stumbled across this file, things should be clear by now. What i am still wondering about is if this uid and value have a special functionality in terms of NER. I am very new to this
    Rachna Chakraborty
    @rachnachakraborty
    Hey @thecodingcrow, to understand why value and uid are needed, lets look at the example on runkit, it shows Tokenization is a pre-requisite to Named Entity Recognition(NER). So a text is first tokenised before identifying the entities in the given text. The value represents the contents of a token and uid represents the unique id given to various text patterns representing a single entity. While preparing content for learn() api, multiple patterns of an entity can be defined with a unique identity(as in uid as uk), which helps the recognize() api detect these patterns as one entity only. You can try this example by learning u.k./UK/United Kingdom with uid as uk and test the outcomes with various combinations these patterns. Hope this is useful. Cheers!
    Rachna Chakraborty
    @rachnachakraborty
    thecodingcrow
    @thecodingcrow

    Thanks for this explanation, now it is clear for me what the uid is used for. :)
    I was already able to set up my own NE recognizer, but it is not quite doing what I had expected.
    I want to build a Resume-parser and for this Im trying to get all needed information with NER. I used a training data set from the interent which I manipulated to get the form { text: "sample", entityType: "sample" }. After I applied .learn() and .recognize() nearly no entity was found correctly, everything was wheter word, punctuation or alien. I wanted to look for names, skills, expierence, etc.
    I had a look at the data set and my idea is that the recognizer is kind of overfitted. (The data set consists mostly of indian resumes and the 'text' values are quite long sometimes, for example "C# (1 year), C++, JS".
    My question now is, is there a way to really learn the recognizer what I am looking for or is it just checking if the desired strings are found anywhere in the tokens?

    Sorry for the spam, but I wanted to make things clear :)
    Thanks in advance for any help, I really appreciate this chat!

    either*
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    wink-ner is a gazetteer based (i.e. look-up driven) NER which can spot patterns smartly. Therefore it will be able to spot skills, cities etc. easily but names, experience could be tricky, especially if you are looking for generalization. We will try and share some ideas on how you can achieve some of it in next couple of days. If you have annotated data then you could consider using wink-perceptron to achieve the objective. A similar case study is there on our blog – NLP in Agriculture.
    Sanjaya Kumar Saxena
    @sanjayaksaxena

    Here is a simple example that may help you:

    var NER = require( 'wink-ner' );
    var Tokenizer = require( 'wink-tokenizer' );
    
    tokenize = Tokenizer().tokenize;
    ner = NER();
    
    var trainingData = [
      { text: 'c + +', entityType: 'skill', uid: 'C++' },
      { text: 'c #', entityType: 'skill', uid: 'C#' },
      { text: 'php', entityType: 'skill', uid: 'PHP' },
      { text: 'my sql', entityType: 'skill', uid: 'MySQL' },
      { text: 'mysql', entityType: 'skill', uid: 'MySQL' },
      { text: 'python', entityType: 'skill', uid: 'Python' },
      { text: 'javascript', entityType: 'skill', uid: 'Javascript' },
      { text: 'java script', entityType: 'skill', uid: 'Javascript' },
      { text: 'nodejs', entityType: 'skill', uid: 'Node.js' },
      { text: 'node js', entityType: 'skill', uid: 'Node.js' },
      { text: 'web design', entityType: 'skill', uid: 'Web Design' },
    ];
    ner.learn( trainingData );
    
    var r = 'I have worked in C++, node js, MY SQL, extensively and have limited web design experience! My email is r2d2@gmail.com.'
    
    tokens = tokenize( r );
    
    tokens = ner.recognize( tokens );
    
    tokens.forEach( ( t ) => {
      if ( t.uid ) console.log( `Skill: ${t.uid}` );
      if ( t.tag === 'email' ) console.log( `E-Mail: ${t.value}` );
    } );

    Produces following output:

    Skill: C++
    Skill: Node.js
    Skill: MySQL
    Skill: Web Design
    E-Mail: r2d2@gmail.com

    Please download the latest version of wink-ner and use.

    Hi @thecodingcrow, do let us know if you need any further help. Do download the latest version of wink-ner!
    Ender Minyard
    @genderev
    Hi, I’m hoping to build a very intelligent search engine in the browser.
    I’ve been using string similarity algorithms but I’m frustrated because they don’t recognize the intent of the user.
    I’m excited to explore winkjs 🙂
    Rachna Chakraborty
    @rachnachakraborty
    @genderev Thank You.
    @genderev Please feel free to write to us for any inputs on use of winkjs packages. All the best!
    Ender Minyard
    @genderev
    How do you use the winkJS node modules in the browser? I tried using browserify for the bm25 text search with no success.
    1 reply
    Prateek Saxena
    @prtksxna
    genderev: Were you able to get it running?
    Ateeq
    @wenning247_twitter
    Does Wink support the Arabic language in sentiment analysis.
    1 reply
    or how do i tran the model to do such thing
    Labs
    @labs20
    Hi there. Great work! Are word2vec and doc2vec on your plans?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    Thanks! Yes they are in our roadmap and will be released in some time.
    Ron Dahan
    @RonDaha
    Hey, is there a way to also config a context for an entity? let's say i am looking for an entity but need to check if that entity found in a context of other entity or a certain words? regarding the NPL Package - learnCustomEntities()
    Pallavi Ratra
    @pallaviratra
    Hello, could you please elaborate your query with an example. That’ll help us in giving you the appropriate response!
    Rakesh PK
    @pkrakesh_twitter
    Can we use Wink js with react native?
    Alex Harwood
    @alexanderpharwood
    Hiya, I am using Wink (thank you very much for the project), to extract keywords from PDFs. I am wondering if it is possible to extend the dictionary (lexicon?) of the tokeniser so that it recognises multiple words as one token, for instance "Microsoft office", or "United Kingdom". Perhaps there is another feature I should be using for this instead of the tokeniser? Apologies if this has already been documented and I have missed it! Any help appreciated. Cheers
    Alex Harwood
    @alexanderpharwood
    After some further research, the addRegex method seems to do exactly what I want. I have a rather large list of custom tokens, do you imagine there will be any significant performance issues if I were to add a few thousand for example?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    Please look at https://winkjs.org/wink-nlp/custom-entities.html — should fulfill your need. Even with a large data set you should get a good performance.
    Alex Harwood
    @alexanderpharwood
    Brilliant -- much appreciated!
    Wenzel
    @creadicted

    Really nice work you have done here!

    Do you have a recommendation how to extract the entity (if one is associated) for each word in a sentence? So far I connect the Information from different sources of 'doc' and it feels really hacky. Also - When extracting the entities - is there a way to find out what type it is?

    Pallavi Ratra
    @pallaviratra
    Hi @creadicted , you can look at the example here https://winkjs.org/wink-nlp/entities.html to see how to extract entities from a specific sentence. To extract the entity types you can use the "its.detail" property explained here - https://winkjs.org/wink-nlp/its-as-helper.html
    Wenzel
    @creadicted
    Thank you! SO there is no POS like command with entities? Then I don't build something that someone else made already :)
    hariom-sinha58
    @hariom-sinha58
    I want to classify my tags of words or tokens into categories. Is this possible here
    ?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    Hi @hariom-sinha58 can you please give an example of what exactly you are looking for?
    hariom-sinha58
    @hariom-sinha58
    I have a form builder where I am providing users to add Tags like their Skills or Hobbies..so lets say they selected JavaScript, Cricket, Football, Harry Potter, Python.. so based on these tokens i would be able to classify that person's interests into sports, coding, other, etc
    Thanks for quick response :)