Skip to content
Michael Röder edited this page Oct 19, 2017 · 1 revision

GERBIL is language agnostic (it even ignores language information). This has advantages but also disadvantages. The main advantage is that GERBIL works for all languages (as long as they can be expressed in text form). The main disadvantage is that the user has to make sure that an annotation system is able to process the documents of the dataset the user is using for benchmarking.

Retrieve the language of a dataset

The language can not be retrieved at the moment. It has to be known beforehand or the annotation system has to be able to detect the language.

This might change in the future (see #35).

Configure the language of a system

This is very system dependent. In general, it is the task of the user to make sure that the system he is benchmarking is able to understand the language of the dataset.

However, using the example of DBpedia Spotlight, we will briefly show how an annotation system can be adapted.

DBpedia Spotlight

The documentation of DBpedia Spotlight shows the list of supported languages. We simply have to adapt the URL of the web service in the annotators.properties file. While by default GERBIL will use

http://model.dbpedia-spotlight.org/en/

this can be easily changed to

http://model.dbpedia-spotlight.org/fr/

You could even copy the definition of the DBpedia Spotlight annotation system. But please make sure that you use a different parameter key than spotlight and a different name that "DBpedia Spotlight". In the following, we define an annotation system DBpedia Spotlight (FR) with the parameter key spotlightFR

org.aksw.gerbil.annotators.definition.spotlightFR.name=DBpedia Spotlight (FR)
org.aksw.gerbil.annotators.definition.spotlightFR.experimentType=OKE_Task1
org.aksw.gerbil.annotators.definition.spotlightFR.cacheable=true
org.aksw.gerbil.annotators.definition.spotlightFR.class=org.aksw.gerbil.annotator.impl.spotlight.SpotlightAnnotator
org.aksw.gerbil.annotators.definition.spotlightFR.constructorArgs=http://model.dbpedia-spotlight.org/fr/

It can be seen that we use the URI for the french endpoint as argument for the constructor (https://github.com/dice-group/gerbil/blob/master/src/main/java/org/aksw/gerbil/annotator/impl/spotlight/SpotlightAnnotator.java#L60).

Note that not all annotation systems might be that easy adaptable. Again: it depends on a) whether an annotation system offers the desired language and b) how its API accepts the language parameter. In many cases, it can be submitted as part of the URL (as described above). However, there might be other annotators, that expect the parameter in a different way, which might make it necessary to adapt the system adapter implementation itself.