wink-ner
is a gazetteer based (i.e. look-up driven) NER which can spot patterns smartly. Therefore it will be able to spot skills, cities etc. easily but names, experience could be tricky, especially if you are looking for generalization. We will try and share some ideas on how you can achieve some of it in next couple of days. If you have annotated data then you could consider using wink-perceptron to achieve the objective. A similar case study is there on our blog – NLP in Agriculture.
Here is a simple example that may help you:
var NER = require( 'wink-ner' );
var Tokenizer = require( 'wink-tokenizer' );
tokenize = Tokenizer().tokenize;
ner = NER();
var trainingData = [
{ text: 'c + +', entityType: 'skill', uid: 'C++' },
{ text: 'c #', entityType: 'skill', uid: 'C#' },
{ text: 'php', entityType: 'skill', uid: 'PHP' },
{ text: 'my sql', entityType: 'skill', uid: 'MySQL' },
{ text: 'mysql', entityType: 'skill', uid: 'MySQL' },
{ text: 'python', entityType: 'skill', uid: 'Python' },
{ text: 'javascript', entityType: 'skill', uid: 'Javascript' },
{ text: 'java script', entityType: 'skill', uid: 'Javascript' },
{ text: 'nodejs', entityType: 'skill', uid: 'Node.js' },
{ text: 'node js', entityType: 'skill', uid: 'Node.js' },
{ text: 'web design', entityType: 'skill', uid: 'Web Design' },
];
ner.learn( trainingData );
var r = 'I have worked in C++, node js, MY SQL, extensively and have limited web design experience! My email is r2d2@gmail.com.'
tokens = tokenize( r );
tokens = ner.recognize( tokens );
tokens.forEach( ( t ) => {
if ( t.uid ) console.log( `Skill: ${t.uid}` );
if ( t.tag === 'email' ) console.log( `E-Mail: ${t.value}` );
} );
Produces following output:
Skill: C++
Skill: Node.js
Skill: MySQL
Skill: Web Design
E-Mail: r2d2@gmail.com
Please download the latest version of wink-ner
and use.
Really nice work you have done here!
Do you have a recommendation how to extract the entity (if one is associated) for each word in a sentence? So far I connect the Information from different sources of 'doc' and it feels really hacky. Also - When extracting the entities - is there a way to find out what type it is?
{ name: 'sports', patterns: '[cricket|football]' }
.
Findings:
var trainingData = [
{ text: 'c++', entityType: 'core-skill'},
{ text: 'c#', entityType: 'core-skill' },
];
ner.learn( trainingData );
var r = 'cricket c# c++ football php mysql my sql.'
Output failing for c# and c++. I cant understand how model is behaving for this, I am getting tokenised value as c,+,+ and c#,# with no proper output. But, If i give training as c + + or c #, ( i.e. with spaces, it works ) ..
While bundling the library for
const model = require( 'wink-eng-lite-model' );
const nlp = winkNLP( model );
I am getting below error :
bundle.js:2172 Uncaught TypeError: require.resolve is not a function
at Object.loadModel [as core] (bundle.js:2172)
at load (bundle.js:177730)
at nlp (bundle.js:177972)
at Object.5.wink-eng-lite-model (bundle.js:2028)
at o (bundle.js:1)
at r (bundle.js:1)
at bundle.js:1
Any hints on where am i doing wrong ? IIt seems to be working well with normal JS. But while using Browserify, I am getting this issue.
wink-eng-lite-web-model
instead of wink-eng-lite-model
, whenever you need to browsify. Please refer to https://winkjs.org/wink-nlp/how-to-run-wink-nlp-in-browser.html for more details.
const nlp = wink_nlp_1.default(wink_eng_lite_model_1.default);
^
TypeError: wink_nlp_1.default is not a function
at Object.<anonymous> (/Volumes/DATA/Projects/INhouse/backend/dist/parser/articles/article.parser.js:7:31)
at Module._compile (internal/modules/cjs/loader.js:1063:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1092:10)
at Module.load (internal/modules/cjs/loader.js:928:32)
at Function.Module._load (internal/modules/cjs/loader.js:769:14)
at Module.require (internal/modules/cjs/loader.js:952:19)
at require (internal/modules/cjs/helpers.js:88:18)