LC-QuAD

Largescale Complex Question Answering Dataset

📢 Announcement: LCQUAD 2.0 is now released, checkout our website http://lc-quad.sda.tech .

Download

Links

Introduction

We release, and maintain a gold standard KBQA (Question Answering over Knowledge Base) dataset containing 5000 Question and SPARQL queries. LC-QuAD uses DBpedia v04.16 as the target KB.

Usage

License: You can download the dataset (released with a GPL 3.0 License), or read below to know more.

Versioning: We use DBpedia version 04-2016 as our target KB. The public DBpedia endpoint (http://dbpedia.org/sparql) no longer uses this version, which might cause many SPARQL queries to not retrieve any answer. We strongly recommend hosting this version locally. To do so, see this guide

Splits: We release the dataset split into training, and test in a 80:20 fashion.

Format: The dataset is released in JSON dumps, where the key corrected_question contains the question, and query contains the corresponding SPARQL query.

The dataset generated has the following JSON structure, kept intact for .

{
 	'_id': 'Unique ID of this datapoint',
  	'corrected_question': 'Corrected, Final Question',
	'id': 'Template ID',
	'query': 'SPARQL Query',
	'template': 'Template used to create SPARQL Query',
	'intermediary_question': 'Automatically generated, grammatically incorrect question'
}

Cite

@inproceedings{trivedi2017lc,
  title={Lc-quad: A corpus for complex question answering over knowledge graphs},
  author={Trivedi, Priyansh and Maheshwari, Gaurav and Dubey, Mohnish and Lehmann, Jens},
  booktitle={International Semantic Web Conference},
  pages={210--218},
  year={2017},
  organization={Springer}
}

Benchmarking/Leaderboard

We're in the process of automating the benchmarking process (and updating results on our webpage). In the meantime, please get in touch with us at priyansh.trivedi@uni-bonn.de, and we'll do it manually. Apologies for this inconvinience.

Methodology

Overview

Automatically create SPARQL queries.
Convert SPARQL queries to intermediary NLQs.
Manually correct intermediary NLQs to create Questions

We start with a set of Seed Entities, and Predicate Whitelist. Using the whitelist, we generate 2-hop subgraphs around seed entities. With a seed entity as supposed answer, we juxtapose SPARQL Templates onto the subgraph, and generate SPARQL queries.

Corresponding to SPARQL template, and based on certain conditions, we assign hand-made NL question templates to the SPARQLs. Refer to this diagram to understand the nomenclature used in templates.

Finally, we follow a two-step (Correct, Review) system to generate a grammatically correct question for every template-generated one.

Changelog

0.1.3 - 19-06-2018

Published train-test splits
Website Updated

0.1.2 - 28-01-2018

Updated public website
Dataset now available in QALD format
Leaderboard underway

0.1.1 - 27-10-2017

Fixed a bug with rdf:type filter in SPARQL
data_set.json updated
updated templates.py

0.1.0 - 01-05-2017

First version released
lc-quad.sda.tech published

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
resources		resources
LICENSE.txt		LICENSE.txt
README.md		README.md
test-data.json		test-data.json
train-data.json		train-data.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/ISSUE_TEMPLATE

.github/ISSUE_TEMPLATE

resources

resources

LICENSE.txt

LICENSE.txt

README.md

README.md

test-data.json

test-data.json

train-data.json

train-data.json

Repository files navigation

LC-QuAD

Largescale Complex Question Answering Dataset

📢 Announcement: LCQUAD 2.0 is now released, checkout our website http://lc-quad.sda.tech .

Download

Links

Introduction

Usage

Cite

Benchmarking/Leaderboard

Methodology

Changelog

0.1.3 - 19-06-2018

0.1.2 - 28-01-2018

0.1.1 - 27-10-2017

0.1.0 - 01-05-2017

About

Releases

Packages

Contributors 3

License

AskNowQA/LC-QuAD

Folders and files

Latest commit

History

Repository files navigation

LC-QuAD

Largescale Complex Question Answering Dataset

📢 Announcement: LCQUAD 2.0 is now released, checkout our website http://lc-quad.sda.tech .

Download

Links

Introduction

Usage

Cite

Benchmarking/Leaderboard

Methodology

Changelog

0.1.3 - 19-06-2018

0.1.2 - 28-01-2018

0.1.1 - 27-10-2017

0.1.0 - 01-05-2017

About

Topics

Resources

License

Stars

Watchers

Forks