by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Rachna Chakraborty
    @rachnachakraborty
    Good Evening
    Prateek Saxena
    @prtksxna
    Sorry I wasn't looking at this for a while. Not sure if anyone came here, I don't think they did.
    I'll be lurking here regularly from now
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    We need to add chat badge to all repos - a pending ToDo
    prtksxna @prtksxna nods
    Prateek Saxena
    @prtksxna
    When using bm25, is it ok to run addDoc after consolidate, and then running consolidate again?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    documents cannot be added post consolidation
    Prateek Saxena
    @prtksxna
    @sanjayaksaxena: Got it! What would you recommend if I need to start the search but also add documents later?
    @sanjayaksaxena: Thanks
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    @prtksxna as of now all documents along with the additional documents will have to be added
    Prateek Saxena
    @prtksxna
    @sanjayaksaxena: Understood :)
    Nishant
    @nishantrpai
    pretty cool project i have to say
    what are all the languages which are currently supported?
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    @nishantrpai thank you :)
    @nishantrpai targeted for nodejs/javascript developers
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    We are announcing a big change for wink today: http://winkjs.org/blog/a-more-permissive-license.html
    allnulled
    @allnulled
    Hello
    Rachna Chakraborty
    @rachnachakraborty
    hi there
    Arye Shalev
    @pantchox
    Hello, great jon on the suite of libs!
    wanted to know if there is a possibility that the tokenizer will output VB/NN/XX instead of just "word"
    Rachna Chakraborty
    @rachnachakraborty
    @pantchox Thank you for using wink packages.
    Rachna Chakraborty
    @rachnachakraborty
    The tokenizer is splitting the given text into valid tokens and publishing the token type as an output. For pos tags you will have to use wink-pos-tagger. The output will be in this format: [ { value: 'He', tag: 'word', normal: 'he', pos: 'PRP' }, // { value: 'is', tag: 'word', normal: 'is', pos: 'VBZ', lemma: 'be' }, // { value: 'trying', tag: 'word', normal: 'trying', pos: 'VBG', lemma: 'try' }, // { value: 'to', tag: 'word', normal: 'to', pos: 'TO' }, // { value: 'fish', tag: 'word', normal: 'fish', pos: 'VB', lemma: 'fish' }
    Rachna Chakraborty
    @rachnachakraborty
    Here is a better formatted output:
    [
      { value: 'He', tag: 'word', normal: 'he', pos: 'PRP' }, 
      { value: 'is', tag: 'word', normal: 'is', pos: 'VBZ', lemma: 'be' }, 
      { value: 'trying', tag: 'word', normal: 'trying', pos: 'VBG', lemma: 'try' },
      { value: 'to', tag: 'word', normal: 'to', pos: 'TO' }, 
      { value: 'fish', tag: 'word', normal: 'fish', pos: 'VB', lemma: 'fish' }
    ]
    you can easily .map this array to any format that you may require.
    Arye Shalev
    @pantchox
    thanks
    thecodingcrow
    @thecodingcrow
    I got a question about the wink-ner package, is there a specific format needed for the training data? in the example were "text" and "entityType" used, is this relevant?
    Prateek Saxena
    @prtksxna
    Hey @thecodingcrow πŸ‘‹πŸ½
    So the training data needs to be an array of objects where the properties text and entityType are required
    You can optionally add more properties like uid and value that'll be returned as is if that entity is found
    @thecodingcrow, you can read more about this here β€” https://winkjs.org/wink-ner/NER.html#learn (see the Parameters section after the code example)
    Prateek Saxena
    @prtksxna
    Hope that helps! Let me know if you run into anything else 😊
    thecodingcrow
    @thecodingcrow
    Hey @prtksxna :) thanks for your fast response and nice explanation. I already stumbled across this file, things should be clear by now. What i am still wondering about is if this uid and value have a special functionality in terms of NER. I am very new to this
    Rachna Chakraborty
    @rachnachakraborty
    Hey @thecodingcrow, to understand why value and uid are needed, lets look at the example on runkit, it shows Tokenization is a pre-requisite to Named Entity Recognition(NER). So a text is first tokenised before identifying the entities in the given text. The value represents the contents of a token and uid represents the unique id given to various text patterns representing a single entity. While preparing content for learn() api, multiple patterns of an entity can be defined with a unique identity(as in uid as uk), which helps the recognize() api detect these patterns as one entity only. You can try this example by learning u.k./UK/United Kingdom with uid as uk and test the outcomes with various combinations these patterns. Hope this is useful. Cheers!
    Rachna Chakraborty
    @rachnachakraborty
    thecodingcrow
    @thecodingcrow

    Thanks for this explanation, now it is clear for me what the uid is used for. :)
    I was already able to set up my own NE recognizer, but it is not quite doing what I had expected.
    I want to build a Resume-parser and for this Im trying to get all needed information with NER. I used a training data set from the interent which I manipulated to get the form { text: "sample", entityType: "sample" }. After I applied .learn() and .recognize() nearly no entity was found correctly, everything was wheter word, punctuation or alien. I wanted to look for names, skills, expierence, etc.
    I had a look at the data set and my idea is that the recognizer is kind of overfitted. (The data set consists mostly of indian resumes and the 'text' values are quite long sometimes, for example "C# (1 year), C++, JS".
    My question now is, is there a way to really learn the recognizer what I am looking for or is it just checking if the desired strings are found anywhere in the tokens?

    Sorry for the spam, but I wanted to make things clear :)
    Thanks in advance for any help, I really appreciate this chat!

    either*
    Sanjaya Kumar Saxena
    @sanjayaksaxena
    wink-ner is a gazetteer based (i.e. look-up driven) NER which can spot patterns smartly. Therefore it will be able to spot skills, cities etc. easily but names, experience could be tricky, especially if you are looking for generalization. We will try and share some ideas on how you can achieve some of it in next couple of days. If you have annotated data then you could consider using wink-perceptron to achieve the objective. A similar case study is there on our blog – NLP in Agriculture.
    Sanjaya Kumar Saxena
    @sanjayaksaxena

    Here is a simple example that may help you:

    var NER = require( 'wink-ner' );
    var Tokenizer = require( 'wink-tokenizer' );
    
    tokenize = Tokenizer().tokenize;
    ner = NER();
    
    var trainingData = [
      { text: 'c + +', entityType: 'skill', uid: 'C++' },
      { text: 'c #', entityType: 'skill', uid: 'C#' },
      { text: 'php', entityType: 'skill', uid: 'PHP' },
      { text: 'my sql', entityType: 'skill', uid: 'MySQL' },
      { text: 'mysql', entityType: 'skill', uid: 'MySQL' },
      { text: 'python', entityType: 'skill', uid: 'Python' },
      { text: 'javascript', entityType: 'skill', uid: 'Javascript' },
      { text: 'java script', entityType: 'skill', uid: 'Javascript' },
      { text: 'nodejs', entityType: 'skill', uid: 'Node.js' },
      { text: 'node js', entityType: 'skill', uid: 'Node.js' },
      { text: 'web design', entityType: 'skill', uid: 'Web Design' },
    ];
    ner.learn( trainingData );
    
    var r = 'I have worked in C++, node js, MY SQL, extensively and have limited web design experience! My email is r2d2@gmail.com.'
    
    tokens = tokenize( r );
    
    tokens = ner.recognize( tokens );
    
    tokens.forEach( ( t ) => {
      if ( t.uid ) console.log( `Skill: ${t.uid}` );
      if ( t.tag === 'email' ) console.log( `E-Mail: ${t.value}` );
    } );

    Produces following output:

    Skill: C++
    Skill: Node.js
    Skill: MySQL
    Skill: Web Design
    E-Mail: r2d2@gmail.com

    Please download the latest version of wink-ner and use.

    Hi @thecodingcrow, do let us know if you need any further help. Do download the latest version of wink-ner!
    genderev
    @genderev
    Hi, I’m hoping to build a very intelligent search engine in the browser.
    I’ve been using string similarity algorithms but I’m frustrated because they don’t recognize the intent of the user.
    I’m excited to explore winkjs πŸ™‚
    Rachna Chakraborty
    @rachnachakraborty
    @genderev Thank You.
    @genderev Please feel free to write to us for any inputs on use of winkjs packages. All the best!
    genderev
    @genderev
    How do you use the winkJS node modules in the browser? I tried using browserify for the bm25 text search with no success.
    1 reply
    Prateek Saxena
    @prtksxna
    genderev: Were you able to get it running?
    Ateeq
    @wenning247_twitter
    Does Wink support the Arabic language in sentiment analysis.
    1 reply
    or how do i tran the model to do such thing