Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    SauerSabrina
    @SauerSabrina
    Hi @lilimelgar , I hope I am in the right room to report bugs in the DIVE+ recipe. This is what I have found so far while working on the demonstration scenario:
    SauerSabrina
    @SauerSabrina
    • Events, in this case a radio bulletin transcript of the KB, shows "strange" unreadable text in the headline, which suggests that these titles may need to be cleaned up . I used Watersnoodramp as the search query. DIVE+ gives me 5 events, of which the 3rd title and description is virtually illegible. See also the demonstration scenario figure 1 for exact result I am referring to.
      -
    SauerSabrina
    @SauerSabrina
    • After clicking on one of the events related to the Watersnoodramp, the Title, ID and Description are shown. However, the Description of for example the first related event is very difficult to read. To give more details, this is some of the text:
    A. scherp Datum: 23feh 53 Tijd: 23. Onderwerp: watersnoodramp slacht offers Het totaal aantal slachtogfers van de watersnoodramp /** /volgens de tot nu toe eno laad hooft ^├╝ti-' or" rjL bedraagt^KKxmag& K^xykxgix& a x& Sxx ter beschikking staande -r---fi LMlTt. T!< Tpn Wanneer ^etfde aantallen ge-identificeerde stoffelijke overschotten , /gcbo4^ ' ^* ^en stoffelijke over& chotten/en de aantallen bij het informatiebureau van het Nederlandse oy heden__- /< %
    SauerSabrina
    @SauerSabrina
    • For this particular case, a larger problem is that it is unclear what the difference is between "events" and "media objects" that are generated when the search query "Watersnoodramp" is used in DIVE+. The only reason I can come up with is that the results that are part of the "media objects" results, are all annotated and recognised as ANP Bulletins, while the results presented under "event" are identified because they are related to the event that was the Watersnoodramp.
    lilimelgar
    @lilimelgar
    Hi Sabrina, yes, this is the right room. The first issue (i.e., unreadable text) originates from the OCR text, when you click on the image, you see that some text was crossed-out with "xxxxxxx". This may not be a problem of the data imports, but perhaps could be solved by enabling future OCR correction. By the way, the KB has implemented this functionality in their current lab (http://lab.kb.nl/tool/xportal#live-demo), but for DIVE this is all future work (the "improve entity" facilities which will be developed further).
    In relation to the distinction between "events" and "media objects", this is indeed related to the data conversion process. Since "events" are not present in the data, they have to be extracted somehow. What Victor did was to assume that every news bulletin represented an event (I can provide access to the draft paper where he explains this). Since this is creating confusion while browsing, I am going to report it in Github once I can think of a solution to solve it... shall we think along on what the solution could be?
    SauerSabrina
    @SauerSabrina
    Hi Liliana, I think what is confusing is the fact that those entries that are currently only listed under "media objects" also contain, by the rationale used to generate events, events. So perhaps it is possible to start extracting events from these objects as well? Or is that far too complicated?
    lilimelgar
    @lilimelgar
    Hi Sabrina, I will forward you an article in which Victor explains the event extraction process for these media objects. Since the article is submitted for publication, I cannot share it here ;). What Victor explains is that because a dataset (in this case the ANP bulletins) did not explicitly include events, "one Event
    object was created for each Media Object" and that "the label of this Event object is derived from
    the OCR'ed description of the Bulletin using simple heuristics". Could you perhaps explain a bit further, or add an screenshot showing why is this not clear from the user perspective? I hope that @biktorrr can also help us clarifying the issue ;). Thanks