Differences to the BAT Framework

The first versions of GERBIL have encapsulated the BAT-Framework of Cornolti et al. [1]. Since version 1.2.0 the project is based on a newly developed, independent evaluation core that already has been tested during the OKE Challenge 2015 and further extended to support nearly all features of the older GERBIL versions. In this article, we summarize the differences between the BAT-Framework and this new evaluation core (and thus the differences between the versions 1.1.4 and 1.2.0 of GERBIL).

Entity/URI Matching

The BAT-Framework is based on Wikipedia IDs and, thus, bound to Wikipedia as the central knowledge base (KB). Every URI that is found inside a dataset or the response of an annotator is translated into a Wikipedia URI and after that in its Wikipedia ID. If it is not possible, the annotation is discarded.

In GERBIL we implemented a URI matching that enables the evaluation based on the URIs. This has several advantages. First, GERBIL is agnostic of the KB that is used for linking instead of being bound to Wikipedia. Second, GERBIL is able to process emerging entities - entities that are not part of the KB and, thus, can not get a URI of this KB.

Note that we are aware of a minor disadvantage of our new evaluation core. During the loading of a dataset, the BAT-Framework is able to identify wrong annotations, i.e., non-existent entities in Wikipedia, since it is not able to retrieve a Wikipedia ID if the Wikipedia URI is not existing. In difference, our evaluation core is not able to run a check whether a gold standard is correct or not. However, we are aware of this and tackle this issue during the upcoming months.

General integration of Sx2KB tasks

The BAT-Framework distinguishes between Sa2KB and A2KB tasks as well as Sc2KB and C2KB tasks. In both cases, the evaluation of the Sx2KB task expects the annotator to add confidence scores to the results. After that the evaluation of the corresponding x2KB task is run several times to find a threshold for the confidence score that maximizes the F1-score. During the first year of GERBIL, we made the experience that this distinction is not useful. Thus, we implemented a general integration of confidence value handling in all experiment tasks. That means, that confidence values added to annotations are considered in all experiments - not only A2KB or C2KB. Because of this integration, we removed the experiment types Sa2KB and Sc2KB.

Removing of Rc2KB

The BAT-Framework offers the Rc2KB experiment type that is an extension of the C2KB task that expects the concepts in a certain order. In [1] the Sc2KB is defined as a superior task of Rc2KB and every annotator that is able to perform Sc2KB is also able to fulfil Rc2KB. However, it can be argued that the score used for ranking the concepts and the score of the Sc2KB task are not the same because the first score is a salience score of the entity while the second score is a confidence score. Since GERBIL does not support salience at its current state, we had to remove the Rc2KB task.

Support of additional tasks and metrics

With our new evaluation core, GERBIL is more flexible and we have added more experiment types, e.g., Entity Typing, as well as additional metrics that are both not present in the BAT-Framework.

References

[1] Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita. A Framework for Benchmarking Entity-Annotation Systems. In Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2013).

Provide feedback

Saved searches