Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
lisa563
@lisa563
OK, thank you.
lisa563
@lisa563
I'm very sorry, I still have some questions.
When annotation, the content of a label marked spans 2 lines, what should I do to mark it all at once?
Annotate approximately how many files, the forecast is more accurate?
lisa563
@lisa563
After I annotated 150 files, there are still many prediction results with a confidence of 0.1 or 0.2. How can I reduce these low-confidence prediction results? Is the annotation file not enough? Or what configuration needs to be done?
Jan-Christoph Klie
@jcklie
@lisa563 Regarding the recommender, that totally depends on your task and the recommender you choose. What exactly are you annotating and using? You can remove recommender with low confidence automatically by setting a threshold in the recommender settings, then the suggestions of that recommender are not shown
Richard Eckart de Castilho
@reckart
@lisa563 for curiosity: did you annotate those 150 files on our demo server?
lisa563
@lisa563
The recommender I used set the tool to Multi-Token Sequence Classifier (OpenNLP NER) and set the Score threshold to 0.7, but there are still many prediction results with a confidence lower than 0.7.
Yes, I annotate those 150 files on your demo server.
Richard Eckart de Castilho
@reckart
@lisa563 please be aware there is no security on the demo server as everybody uses the shared demo account and that we even regularly wipe it. We can only warn about not doing any serious work on the demo server.
Richard Eckart de Castilho
@reckart
@lisa563 regarding the confidence threshold: I have opened an issue for it: inception-project/inception#2028
lisa563
@lisa563
Thanks
lisa563
@lisa563
When I made annotations in the Named Entity Layer, I set Recmmender to use the Multi-Token Sequence Classifier (OpenNLP NER) tool, so I couldn't select the content of multiple lines at once. When the Allow crossing sentence boundaries is checked in the Layer configuration, the OpenNLP NER Tool can no longer be used. What should I do?
Richard Eckart de Castilho
@reckart
sequence labelling across sentence boundaries is not supported . If you want to annotate entire sentences, set the annotation granularity from "tokens" to "sentences" and then you can choose another type of opennlp recommender more suitable. Mind you might have to create a custom span layer for that - I do not know if the builtin NE layer accepts being configured to entire sentences.
lisa563
@lisa563
After setting the annotation granularity from "tokens" to "sentences", what type of opennlp recommender is suitable?
lisa563
@lisa563
I found that you use the "." Symbol to divide the sentence boundaries, but many people's names contain the "." Symbol, which will be divided into multiple sentences and cannot be fully labeled, such as: Dr. Anthony Arnett, Yolonda Y. Green, can this problem be solved?
Richard Eckart de Castilho
@reckart
if you import as "plain text" or any other format which does not define sentence boundaries, then a default splitter is applied
currently, we have no way of selecting a different splitter
but you can import using "plain text (one sentence per line)"
or even "plain text (sentence per line, space-separated tokens) to have control over token / sentence splitting
won't work with PDF's I'm afraid
at some point, we plan to make tokens/sentence boundaries editable, but not quite there yet
for layers with sentence-level granularity, the OpenNLP doccat recommender should be offered
lisa563
@lisa563
Thank you very much.
lisa563
@lisa563
Can the relation between entities be predicted? How should I configure it?
Richard Eckart de Castilho
@reckart
Relation prediction is being worked on on a branch
Richard Eckart de Castilho
@reckart
@lisa563 btw, a proper confidence threshold setting on the recommender sidebar is also pretty much done now and will be in the 0.19.0 release when it comes out
lisa563
@lisa563
Thank you.
Ivan Habernal
@habernal
hey all, I'm trying to find out whether there is a quick fix for wrapping longer annotation spans. Looking into the HTML code, it seems it's a SVG text tag, which doesn't look like supporting text wrapping - so there is perhaps no quick fix, correct?
Richard Eckart de Castilho
@reckart
At the Moment you only have the option of prewrapping the text before Import and then using the “ brat line oriented” display mode
Ivan Habernal
@habernal
Alright, I see, that's what I thought
I tested also the HTML mode which is in fact not bad, the only thing is that you cannot select and remove existing annotation
Richard Eckart de Castilho
@reckart
It is a known issue even in brat
You can
When you move the mouse over an annotation a popup appears
Click on that to select the annotation
Ivan Habernal
@habernal
yeah... well hidden :)
Richard Eckart de Castilho
@reckart
Not the best UX but it works
Ivan Habernal
@habernal
confirmed
ok, say I prepare my documents as HTML with just paragraphs <p> - how about implicit pre-processing (tokenization etc.), will that work?
I mean, I could simply try out but maybe there's some warning signs
Richard Eckart de Castilho
@reckart
implicit tokenization & sentence splitting works with the usual quality provided by the Java BreakIterator
Preparing the HTML with a different tokenizer would also be possible to do externally (via DKPro Core), but not in INCEpTION at this time.
Ivan Habernal
@habernal
Thanks! I'll give it a shot
Ivan Habernal
@habernal
I tested it with a very simple HTML (three paragraphs), and there are some issues - I annoated three words, on the left-hand site (the document view) it highlighted something different, and in the tool window (right-hand side), the Text field contains also something different. So it's not really working :(
Richard Eckart de Castilho
@reckart
which version?
Ivan Habernal
@habernal
0.18.3 (2021-03-09 20:02:49, build bf16e970)
Richard Eckart de Castilho
@reckart
I had tested with a simple two paragraph HTML ;) looked ok for me. so we'll have to look again. Could you provide your test document in an issue?
Ivan Habernal
@habernal
These are not really public documents
Richard Eckart de Castilho
@reckart
ok "very simple HTML" sounded like you had cooked up a test document