Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Prateek Saxena
    @prtksxna
    Hope that helps! Let me know if you run into anything else 😊
    thecodingcrow
    @thecodingcrow
    Hey @prtksxna :) thanks for your fast response and nice explanation. I already stumbled across this file, things should be clear by now. What i am still wondering about is if this uid and value have a special functionality in terms of NER. I am very new to this
    Rachna Chakraborty
    @rachnachakraborty
    Hey @thecodingcrow, to understand why value and uid are needed, lets look at the example on runkit, it shows Tokenization is a pre-requisite to Named Entity Recognition(NER). So a text is first tokenised before identifying the entities in the given text. The value represents the contents of a token and uid represents the unique id given to various text patterns representing a single entity. While preparing content for learn() api, multiple patterns of an entity can be defined with a unique identity(as in uid as uk), which helps the recognize() api detect these patterns as one entity only. You can try this example by learning u.k./UK/United Kingdom with uid as uk and test the outcomes with various combinations these patterns. Hope this is useful. Cheers!
    Rachna Chakraborty
    @rachnachakraborty
    thecodingcrow
    @thecodingcrow

    Thanks for this explanation, now it is clear for me what the uid is used for. :)
    I was already able to set up my own NE recognizer, but it is not quite doing what I had expected.
    I want to build a Resume-parser and for this Im trying to get all needed information with NER. I used a training data set from the interent which I manipulated to get the form { text: "sample", entityType: "sample" }. After I applied .learn() and .recognize() nearly no entity was found correctly, everything was wheter word, punctuation or alien. I wanted to look for names, skills, expierence, etc.
    I had a look at the data set and my idea is that the recognizer is kind of overfitted. (The data set consists mostly of indian resumes and the 'text' values are quite long sometimes, for example "C# (1 year), C++, JS".
    My question now is, is there a way to really learn the recognizer what I am looking for or is it just checking if the desired strings are found anywhere in the tokens?

    Sorry for the spam, but I wanted to make things clear :)
    Thanks in advance for any help, I really appreciate this chat!

    either*
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    wink-ner is a gazetteer based (i.e. look-up driven) NER which can spot patterns smartly. Therefore it will be able to spot skills, cities etc. easily but names, experience could be tricky, especially if you are looking for generalization. We will try and share some ideas on how you can achieve some of it in next couple of days. If you have annotated data then you could consider using wink-perceptron to achieve the objective. A similar case study is there on our blog – NLP in Agriculture.
    Sanjaya Kumar Saxena
    @sanjayaksaxena

    Here is a simple example that may help you:

    var NER = require( 'wink-ner' );
    var Tokenizer = require( 'wink-tokenizer' );
    
    tokenize = Tokenizer().tokenize;
    ner = NER();
    
    var trainingData = [
      { text: 'c + +', entityType: 'skill', uid: 'C++' },
      { text: 'c #', entityType: 'skill', uid: 'C#' },
      { text: 'php', entityType: 'skill', uid: 'PHP' },
      { text: 'my sql', entityType: 'skill', uid: 'MySQL' },
      { text: 'mysql', entityType: 'skill', uid: 'MySQL' },
      { text: 'python', entityType: 'skill', uid: 'Python' },
      { text: 'javascript', entityType: 'skill', uid: 'Javascript' },
      { text: 'java script', entityType: 'skill', uid: 'Javascript' },
      { text: 'nodejs', entityType: 'skill', uid: 'Node.js' },
      { text: 'node js', entityType: 'skill', uid: 'Node.js' },
      { text: 'web design', entityType: 'skill', uid: 'Web Design' },
    ];
    ner.learn( trainingData );
    
    var r = 'I have worked in C++, node js, MY SQL, extensively and have limited web design experience! My email is r2d2@gmail.com.'
    
    tokens = tokenize( r );
    
    tokens = ner.recognize( tokens );
    
    tokens.forEach( ( t ) => {
      if ( t.uid ) console.log( `Skill: ${t.uid}` );
      if ( t.tag === 'email' ) console.log( `E-Mail: ${t.value}` );
    } );

    Produces following output:

    Skill: C++
    Skill: Node.js
    Skill: MySQL
    Skill: Web Design
    E-Mail: r2d2@gmail.com

    Please download the latest version of wink-ner and use.

    Hi @thecodingcrow, do let us know if you need any further help. Do download the latest version of wink-ner!
    Ender Minyard
    @genderev
    Hi, I’m hoping to build a very intelligent search engine in the browser.
    I’ve been using string similarity algorithms but I’m frustrated because they don’t recognize the intent of the user.
    I’m excited to explore winkjs 🙂
    Rachna Chakraborty
    @rachnachakraborty
    @genderev Thank You.
    @genderev Please feel free to write to us for any inputs on use of winkjs packages. All the best!
    Ender Minyard
    @genderev
    How do you use the winkJS node modules in the browser? I tried using browserify for the bm25 text search with no success.
    1 reply
    Prateek Saxena
    @prtksxna
    genderev: Were you able to get it running?
    Ateeq
    @wenning247_twitter
    Does Wink support the Arabic language in sentiment analysis.
    1 reply
    or how do i tran the model to do such thing
    Labs
    @labs20
    Hi there. Great work! Are word2vec and doc2vec on your plans?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    Thanks! Yes they are in our roadmap and will be released in some time.
    Ron Dahan
    @RonDaha
    Hey, is there a way to also config a context for an entity? let's say i am looking for an entity but need to check if that entity found in a context of other entity or a certain words? regarding the NPL Package - learnCustomEntities()
    Pallavi Ratra
    @pallaviratra
    Hello, could you please elaborate your query with an example. That’ll help us in giving you the appropriate response!
    Rakesh PK
    @pkrakesh_twitter
    Can we use Wink js with react native?
    Alex Harwood
    @alexanderpharwood
    Hiya, I am using Wink (thank you very much for the project), to extract keywords from PDFs. I am wondering if it is possible to extend the dictionary (lexicon?) of the tokeniser so that it recognises multiple words as one token, for instance "Microsoft office", or "United Kingdom". Perhaps there is another feature I should be using for this instead of the tokeniser? Apologies if this has already been documented and I have missed it! Any help appreciated. Cheers
    Alex Harwood
    @alexanderpharwood
    After some further research, the addRegex method seems to do exactly what I want. I have a rather large list of custom tokens, do you imagine there will be any significant performance issues if I were to add a few thousand for example?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    Please look at https://winkjs.org/wink-nlp/custom-entities.html — should fulfill your need. Even with a large data set you should get a good performance.
    Alex Harwood
    @alexanderpharwood
    Brilliant -- much appreciated!
    Wenzel
    @creadicted

    Really nice work you have done here!

    Do you have a recommendation how to extract the entity (if one is associated) for each word in a sentence? So far I connect the Information from different sources of 'doc' and it feels really hacky. Also - When extracting the entities - is there a way to find out what type it is?

    Pallavi Ratra
    @pallaviratra
    Hi @creadicted , you can look at the example here https://winkjs.org/wink-nlp/entities.html to see how to extract entities from a specific sentence. To extract the entity types you can use the "its.detail" property explained here - https://winkjs.org/wink-nlp/its-as-helper.html
    Wenzel
    @creadicted
    Thank you! SO there is no POS like command with entities? Then I don't build something that someone else made already :)
    hariom-sinha58
    @hariom-sinha58
    I want to classify my tags of words or tokens into categories. Is this possible here
    ?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    Hi @hariom-sinha58 can you please give an example of what exactly you are looking for?
    hariom-sinha58
    @hariom-sinha58
    I have a form builder where I am providing users to add Tags like their Skills or Hobbies..so lets say they selected JavaScript, Cricket, Football, Harry Potter, Python.. so based on these tokens i would be able to classify that person's interests into sports, coding, other, etc
    Thanks for quick response :)
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    Hi @hariom-sinha58 One of the simplest approach would be to use custom entities — https://winkjs.org/wink-nlp/custom-entities.html — where you can map each word or words to its category (i.e. entity) e.g. { name: 'sports', patterns: '[cricket|football]' }.
    hariom-sinha58
    @hariom-sinha58
    Thank You Sir for your quick response. I implemented the same use-case today using your Named Entity Representation approach and Kudos.. It works Great. Though, there are few findings from my side, which was a bit not expected from the output from the model. I am still looking into that, to figure out, if that it is a training issue. Stories apart, I need a minified Version of the WINK to directly put into the JS. Would that be possible ?

    Findings:

    var trainingData = [
    { text: 'c++', entityType: 'core-skill'},
    { text: 'c#', entityType: 'core-skill' },
    ];
    ner.learn( trainingData );

    var r = 'cricket c# c++ football php mysql my sql.'

    Output failing for c# and c++. I cant understand how model is behaving for this, I am getting tokenised value as c,+,+ and c#,# with no proper output. But, If i give training as c + + or c #, ( i.e. with spaces, it works ) ..

    One last query ? Can the Sentiment API works for multi language ?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    Hi @hariom-sinha58 this is the expected behavior as custom entity detection happens on "tokens" — this is mentioned in the link shared with you. Currently we have model for only English language. However winkNLP is capable of multiple languages.
    hariom-sinha58
    @hariom-sinha58
    Thanks Sir for all the valuable inputs.
    hariom-sinha58
    @hariom-sinha58
    Hi,

    While bundling the library for
    const model = require( 'wink-eng-lite-model' );
    const nlp = winkNLP( model );

    I am getting below error :

    bundle.js:2172 Uncaught TypeError: require.resolve is not a function
    at Object.loadModel [as core] (bundle.js:2172)
    at load (bundle.js:177730)
    at nlp (bundle.js:177972)
    at Object.5.wink-eng-lite-model (bundle.js:2028)
    at o (bundle.js:1)
    at r (bundle.js:1)
    at bundle.js:1

    Any hints on where am i doing wrong ? IIt seems to be working well with normal JS. But while using Browserify, I am getting this issue.

    Sanjaya Kumar Saxena
    @sanjayaksaxena
    @hariom-sinha58 please use wink-eng-lite-web-model instead of wink-eng-lite-model, whenever you need to browsify. Please refer to https://winkjs.org/wink-nlp/how-to-run-wink-nlp-in-browser.html for more details.
    hariom-sinha58
    @hariom-sinha58
    Ok. Thanks Sir.
    One quick question Sir, All the Libraries that are exposed by WINK, are they ASync in nature ?
    Rachna Chakraborty
    @rachnachakraborty
    Hi @hariom-sinha58 winkjs APIs are are not async but they can be wrapped in async calls to deliver the desired output. An example of such a code can be found on our site:https://winkjs.org/wink-nlp/how-to-run-nlp-on-pdf.html
    Wen-Chieh Lee
    @wjlee-barco
    In Wink, is there any API to detect a sentence is completed or not? Thx
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    @wjlee-barco it has api to detect sentences from text. It can not detect if a sentence is grammatically complete.