Question Answering

GERBIL now supports Questions Answering (QA)!

Experiment Types

Question Answering: The first experiment is the tradition experiment as described by evaluation campaigns like OKBQA and QALD. It aims to measure the capability of a system to answer questions correctly. A system's answer and the corresponding gold standard answer are regarded as set of URIs and literals and the traditional precision, recall and F-measure used for evaluation.

Sub-Experiments

Resource to Knowledge Base (C2KB): This sub-experiment aims at the identification of all resources that are relevant for the given question. It is known from the core GERBIL as \emph{Concept to KB}. The evaluation calculates the measures precision, recall and F-measure based on the comparison of the expected resource URIs and the URIs returned by the QA system. Instead of a simple string comparison we make use of an advanced meaning matching implementation offered by GERBIL and explained in release 1.2.2.
Properties to Knowledge Base (P2KB): This sub-experiment is a special form of the C2KB sub-experiment type. For this experiment, the system has to identify all properties that are relevant for the given question but follows the process of the C2KB experiment.
Relation to Knowledge Base (RE2KB): This sub-experiment focuses on the triples that have to be extracted from the question and are needed to generate the SPARQL query that would retrieve the correct answers. These triples can contain resources, variables and literals. The evaluation of this sub-experiment calculates precision, recall and F-measure based on the comparison of the expected triples and the triples returned by the QA system. For achieving a true positive, a returned triple has to match an expected triple. Two triples are counted as matching if they contain the same resources at the same positions. If they contain variables, the positions of the variables have to be the same but the variable names are ignored. If they contain a literal, the value of the literal has to be the same.
Answer Type (AT): The identification of the answer type is an important part of a QA system. We distinguish 5 different answer types extracted from the QALD-6 benchmarking campaign, i.e., date, number, string, boolean and resource, where resource can be a single URI or a set of URIs. A single answer type is expected for each question. This is the type for which the F-measure is calculated. Note that this sub-experiment can only generate meaningful results if the eQALD-JSON is used. For the case of our running example, we expect resource as answer type.
Answer Item Type to Knowledge Base (AIT2KB): The answer item types are the rdf:type information of the returned resources. Precision, recall and F-measure are calculated based on the set of expected types. If the expected answer set of a question does not contain resources the set of answer item types is expected to be empty.

Web service interface

A template Java Service can be found here https://github.com/dice-group/GerbilQA-Benchmarking-Template

A QA system that can not be found in the list of available systems can still be benchmarked if it implements the following web service interface. The interface is basically build upon the definition of the QALD Json format. Note, the answers value follows the W3C recommendation https://www.w3.org/TR/sparql11-results-json/ .

Every question is sent as a POST request containing the parameters query with its value as UTF-8 endoded String and lang to provide the language of the question as an ISO 639-1 language code.

query=What is the capital of Germany?&lang=en

The system expects a JSON based response using the QALD format.

{
	"questions": [{
		"id": "1",
		"question": [{
			"language": "en",
			"string": "Which German cities have more than 250000 inhabitants?"
		}],
		"query": {
			"sparql": "SELECT DISTINCT ?uri WHERE { { ?uri <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/City> . } UNION { ?uri <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Town> . }  ?uri <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Germany> .  ?uri <http://dbpedia.org/ontology/populationTotal> ?population .  FILTER ( ?population > 250000 ) } "
		},
		"answers": [{
			"head": {
				"vars": [
					"uri"
				]
			},
			"results": {
				"bindings": [{
					"uri": {
						"type": "uri",
						"value": "http://dbpedia.org/resource/Bonn"
					}
				}]
			}
		}]
	}]
}

We also support the eQALD JSON format (see http://www.semantic-web-journal.net/content/benchmarking-question-answering-systems):

{
	"dataset": {
		"id": "the dataset id",
		"metadata": "some metadata..."
	},
	"questions": [{
		"id": "the question id",
		"metadata": {
			"answertype": "Date|Number|String|ListOfResource ",
			"hybrid": "TRUE|FALSE",
			"aggregation": "TRUE|FALSE",
			"answeritemtype": [
				"e.g., dbo:Person"
			]
		},
		"question": [{
			"language": "e.g. en or de",
			"string": "The question in that particular language...",
			"keywords": "question as keywords",
			"annotations": [{
				"char_begin": "5...",
				"char_end": "11...",
				"URI": "e.g. dbr:Berlin...",
				"type": "CLASS|PROPERTY|ENTITY"
			}]
		}],
		"query": {
			"SPARQL": "Question as SPARQL",
			"schemaless ": "Schema-less SPARQL"
		},
		"answers": [{
			"bindings": [{
				"result": {
					"type": "...",
					"value": "..."
				}
			}],
			"confidence": "e.g. 0.9..."
		}]
	}]
}

We accept but highly discourage the use of the old QALD XML format, see QALD 4 and earlier https://github.com/ag-sc/QALD/tree/master/4/data .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Answering

Experiment Types

Sub-Experiments

Web service interface

Clone this wiki locally