Skip to content
Nikit Srivastava edited this page Aug 8, 2018 · 5 revisions

Implemented Experiment types

Type Sub task Description
A2KB C2KB, D2KB, ERec For a given plain text, the annotator should find named entities and link them to a knowledge base (KB). Note that it is recommended to generate an artificial URI for entities that can not be found inside the KB.
C2KB --- The annotator adds a list of concepts to a given text without marking their position inside the text.
D2KB --- The annotator gets a text with already marked named entities and should link them to a KB. Note that it is recommended to generate an artificial URI for entities that can not be found inside the KB.
ERec --- The annotator gets a plain text and should find all entities. Note that a linking to a KB is not needed.
ETyping --- The annotator gets a text with already marked named entities and should assign types of a KB to these entities.
KE OKETask1, OKE2018Task4 The Annotator gets a text and shall recognize entities inside and their types along with their relations.
OKETask1 ERec, D2KB, ETyping Task 1 of the OKE Challenge 2015. For a given plain text, the annotator should find named entities, assign a type to them and link them to a KB. Note that it is recommended to generate an artificial URI for entities that can not be found inside the KB. Note that the measured Precision, Recall and F1-measure are the averages of the measured achieved by the sub tasks.
OKETask2 --- Task 2 of the OKE Challenge 2015. The annotator gets a text in which a single named entity has already been annotated. The annotator should find the type description of this entity and mark its position inside the text. Additionally, this type should be generalized to one of the DOLCE+DnS Ultra Lite classes.
OKE2018Task4 A2KB, RE The annotator gets a text and shall recognize entities inside and the relationship between them.
RE --- The annotator gets a text and several already linked entities and shall recognize their relationship inside the text.
RT2KB ERec, ETyping The annotator gets a text and shall recognize entities inside and their types

Relations between experiment types

GERBIL supports complex experiment types which can be separated into smaller sub types. This leads to a hierarchy of the types since a system that can solve a complex type covers the easier task as well. For example, if a system can solve an A2KB task, it is also able to recognise (ERec) and link (D2KB) entities - even if these two tasks are not explicitly implemented as separated steps.

The following picture shows the different experiment types. A system covering a "higher" experiment type is also able to solve all lower types. The same holds for datasets.

gerbil-hierarchy.png
Hierarchical representation of implemented Experiment types in Gerbil

Removed experiment types

During the further development of GERBIL, some older experiment types have been removed that where originally described in [1] and [2].

The two experiment types Sa2KB and Sc2KB have been integrated into the experiments A2KB and C2KB. This step was necessary to (1) ensure the usage of confidence scores inside the annotator results and (2) avoid the implementation of additional experiment types, e.g., Sd2KB or SERec.

The experiment type Rc2KB has been removed because of several reasons.

  1. From our point of view the hierarchy of C2KB, Rc2KB and Sc2KB described in [2] is not correct because the confidence scores of the Sc2KB part do not express the importance. But following the definition of the Rc2KB task the concepts should be ranked by their importance for the text.
  2. Since NIF as well as other vocabularies do not define a property for assigning a rank or an importance score to an annotation, we would have to define our own property. However, this contradicts with our goal to use standardized vocabularies for the communication between GERBIL and the annotators.
  3. From our experience the usage of this experiment type was not high enough to justify the amount of time that would have been needed for solving these problems

References

[1] Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis und Lars Wesemann. GERBIL -- General Entity Annotation Benchmark Framework. In Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2015).

[2] Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita. A Framework for Benchmarking Entity-Annotation Systems. In Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2013).