Relation Extraction

Relation Extraction (RE) is the procedure used to detect the relations between various entities in (unstructured/unlabelled) texts. RE is a very actively researched field and there have been a lot of very interesting papers and promising algorithms proposed in the recent years along with a multitude of high quality datasets.

This repositories goal is to provide an overview of the current research challenges and how they are adressed.

If some links appear broken, you can feel free to update that link and issue a pull request. (Or you can just notify me, that's fine too.)

Surveys:

Knowledge Graphs / Knowledge Bases

DBpedia Website / GitHub / Paper
Freebase Website / DEPRECATED / Paper
YAGO Website / Latest Release / Paper
Wikidata Website / Paper

Datasets:

Here is a distribution of some of the most used datasets showing their usage frequency in over 550 papers.

If you created a new dataset or found something missing, please don't hesitate to create a pull request to add it here.

Datasets for Semantic Parsing
1. LC-QuAD Paper / Website / Repository
2. LC-QuAD 2.0 Paper / Website
3. ComplexWebQuestions Paper / Website
4. WebQuestionsSP Paper / Download
5. QALD Series Website
6. CompositionalFreebaseQuestions (CFQ) Paper / Repository
Datasets for Information Retrieval
1. SimpleQuestions Paper / Repository
2. WebQuestions Paper / Website
3. ComplexQuestions (unfortunately, there are 2 datasets with the same name ComplexQuestions)
  1. ComplexQuestions (sometimes referred to as CompQ) Paper / Repository
  2. ComplexQuestions Paper / Website – Note that the dataset was provided by a different author
4. MetaQA Paper / Repository
Datasets for Reinforcement Learning
1. UMLS Paper / Repository (MINERVA Repository)
2. NELL-995 Paper / Repository (MINERVA Repository)
3. Kinship Paper / Repository (MINERVA Repository)
4. FB15K-237 Paper (Original FB15K) / Paper (FB15K-237 Variant) / Download (FB15K-237)
5. WN18RR Paper / Repository
6. Countries Paper / Repository (MINERVA Repository)
Datasets for Hybrid KGQA
1. CommonSenseQA Paper / Website
2. OpenBookQA Paper / Website

Performance Leaderboard per Dataset

When reporting the results of your approach, make sure to be as precise as possible. You would be surprised, how many papers report ambiguous results. If your approach outperforms everyone else for a certain benchmark, make sure to mark it bold.

You should be familiar with the following report metrics, but if you are not, here's a short recap:

TP = True Positive
FP = False Positive
TN = True Negatives
FN = False Negatives

P = Precision

Measures, how many of your true predictions did you get right.
Formula: P = TP / (TP + FP)

R = Recall

How many positive labels did you find out of all the positive labels that exist
Formula = R = TP / (TP + FN)

F = F1

Harmonic mean of precision and recall
F1 = 2 * P * R / (P + R)
= 2 * TP / (2 * TP + FP + FN)

RE = Relation Extraction Subtask

This metric refers solely to the RE subtask, i.e. how well can you find the correct relations. This metric is different from E2E.

E2E = End to End

This metric shows the result of running your algorithm end to end on the dataset's test set. End to end means the whole process from start to finish.

1. QALD-Series

2. LC-QuAD

3. FreebaseQA

4. SimpleQuestion & SQB

5. WebQuestion & Derivatives

6. Free917

7. ComplexQuestions

8. MetaQA

9. PathQuestion

10. MSF

11. NYT

12. CommonsenseQA, OpenbookQA & ComplexWebQuestions

13. Reinforcement Learning Datasets

14. KBC

15. PQA

QALD-Series

	QALD-5
HCqa (Asadifar et al., 2019)*	P = 0.7 R = 1.0 F = 0.81

*) Tested only on 10 questions

	QALD-6
HCqa (Asadifar et al., 2019)*	P = 0.42 R = 0.42 F = 0.52

*) Tested only on 25 questions

	QALD-7
SLING (Mihindukulasooriya et al., 2020)	P = 0.57 R = 0.76 F = 0.65
EARL (Dubey et al., 2018)	RE = 0.47
GGNN (Sorokin and Gurevych, 2018)	P = 0.2686 R = 0.3179 F = 0.2588

	QALD-9
SLING (Mihindukulasooriya et al., 2020)	P = 0.50 R = 0.64 F = 0.56

LC-QuAD

	LC-QuAD 1
SLING (Mihindukulasooriya et al., 2020)	P = 0.41 R = 0.44 F = 0.48
EARL (Dubey et al., 2018)	RE = 0.36

FreebaseQA

	FreebaseQA (Paper / Repository )
Retrieve and Re-rank (Wang et al., 2021)	E2E = 0.517

SimpleQuestions

	SimpleQuestions
AdvT-MMRD (Zhang et al., 2020)	RE = 0.938 E2E = 0.790
MLTA (Wang et al., 2019)	RE = 0.824
Question Matching (Abolghasemi et al., 2020)	RE = 0.9341
Relation Splitting (Hsiao et al., 2017)	E2E = 0.767
KSA-BiGRU (Zhu et al., 2019)	P = 0.867 R = 0.848 F = 0.849 E2E = 0.731
Alias Matching (Buzaaba and Amagasa, 2021)	RE = 0.8288 E2E = 0.7464
Synthetic Data (Sidiropoulos et al., 2020)	*RE (unseen domain) = 0.7041 E2E (seen domain) = 0.77 E2E* (unseen domain) = 0.6657**
Transfer Learning with BERT (Lukovnikov et al., 2020)	RE = 0.836 E2E = 0.773
Retrieve and Re-rank (Wang et al., 2021)	E2E = 0.797
HR-BiLSTM (Yu et al., 2017)	RE = 0.933 E2E = 0.787
Multi-View Matching (Yu et al., 2018)	RE = 0.9375

*) Average of Micro + Macro

	SimpleQuestions-Balanced (Paper / Repository)
HR-BiLSTM (Yu et al., 2017)	RE* (seen) = 0.891 RE(unseen) = 0.412 RE(seen+unseen avg.) = 0.673
Representation Adapter (Wu et al., 2019)	RE* (seen) = 0.8925 RE(unseen) = 0.7515 RE(seen+unseen avg.) = 0.83

*) Average of Micro + Macro

WebQuestions + Derivatives

	WebQuestions
Support Sentences (Li et al., 2017)	P = 0.572 R = 0.396 F = 0.382 E2E = 0.423
QARDTE (Zheng et al., 2018)	P = 0.512 R = 0.613 F = 0.558 RE = 0.843
HybQA (Mohamed et al., 2017)	F = 0.57

	WebQuestionsSP
HR-BiLSTM (Yu et al., 2017)	RE = 0.8253
UHOP (Chen et al., 2019) (w/ HR-BiLSTM)	RE = 0.8260
OPQL (Sun et al., 2021)	RE = 0.8540 E2E = 0.519
Multi-View Matching (Yu et al., 2018)	RE = 0.8595
Masking Mechanism (Chen et al., 2018)	RE = 0.77

	WebQuestionsSP-WD (Paper / Repository)
GGNN (Sorokin and Gurevych, 2018)	P = 0.2686 R = 0.3179 F = 0.2588

Free917

	Free917 (Original Paper / Data)
QARDTE (Zheng et al., 2018)	P = 0.683 R = 0.679 F = 0.663

ComplexQuestions

	ComplexQuestions
HCqa (Asadifar et al., 2019)	F = 0.536

MetaQA

	MetaQA
OPQL (Sun et al., 2021)	E2E (2-Hop) = 0.885 E2E (3-Hop) = 0.871
RDAS (Wang et al., 2021)	E2E (1-Hop) = 0.991 E2E (2-Hop) = 0.97 E2E (3-Hop) = 0.856
Incremental Sequence Matching (Lan et al., 2019)	F = 0.981 E2E (1-Hop) = 0.963 E2E (2-Hop) = 0.991 E2E (3-Hop) = 0.996

PathQuestion

	PathQuestion (Paper / Repository)
Incremental Sequence Matching (Lan et al., 2019)	F = 0.96 E2E* = 0.967
RDAS (Wang et al., 2021)	E2E (2-Hop) = 0.736 E2E (3-Hop) = 0.910

*) 2-Hop and 3-Hop mixed

MSF

	MSF (Paper / Repository)
OPQL (Sun et al., 2021)	E2E (2-Hop) = 0.492 E2E (3-Hop) = 0.297

NYT

	NYT (Paper / Data)
Deep RL (Qin et al., 2018)	*F = 0.778**
ReQuest (Wu et al., 2017)	P = 0.404 R = 0.48 F = 0.439

*) Average

Hybrid QA

	ComplexWebQuestions
OPQL (Sun et al., 2021)	E2E = 0.407

	OpenBookQA
MHGRN (Feng et al., 2020)	E2E = 0.806
QA-GNN (Yasunaga et al., 2021)	E2E = 0.828

	CommonsenseQA
MHGRN (Feng et al., 2020)	E2E = 0.765
QA-GNN (Yasunaga et al., 2021)	E2E = 0.761

Reinforcment Learning

	Kinship
MINERVA (Das et al., 2018)	E2E = 0.605
Reward Shaping (Lin et al., 2018)	E2E = 0.811

	UMLS
MINERVA (Das et al., 2018)	E2E = 0.728
Reward Shaping (Lin et al., 2018)	E2E = 0.902

	Countries
MINERVA (Das et al., 2018)	*E2E = 0.9582**

*) Average of S1, S2 and S3

	WN18RR
MINERVA (Das et al., 2018)	E2E = 0.413
Reward Shaping (Lin et al., 2018)	E2E = 0.437

	FB15K-237
MINERVA (Das et al., 2018)	E2E = 0.217
Reward Shaping (Lin et al., 2018)	E2E = 0.329

	NELL-995
MINERVA (Das et al., 2018)	E2E = 0.663
Reward Shaping (Lin et al., 2018)	E2E = 0.656

KBC

	KBC (Paper / Repository)
ROP (Yin et al., 2018)	*E2E = 0.7616**

*) Here: the mean average precision

PQA

	PQA (Paper / Repository)
ROP (Yin et al., 2018)	E2E = 0.907

Research Challenges:

For each solution to a challenge, a short description is provided. If you write a paper, that deals with these challenges, you can create a pull request and add a link to your paper with a short description of the paper. If it fits to no challenge provided here, you may create a new entry and add your paper there. Make sure to add a little description of the new challenge that you added.

Table of Contents

1. Lexical Gap

2. Incomplete Knowledge Graphs

3. Disambiguation Problem

4. Noise From Distant Supervision

5. Inclusion of Structured Information From Subgraphs

6. Hybrid Question-Answering

7. New and Unseen Domains

8. Integration of Language Models for Relation Extraction

9. Candidate Generation

10. Low Relation Extraction Accuracy

Lexical Gap

The lexical gap problems refer to the situation in which the expression of a relation differs in how they are represented in a KB (this problem is also related to the relation linking problem). When faced with the question where was Angela Merkel born? the corresponding relation "birthPlace" does not appear in the question. This means that exact matching procedures would fail in this situation, requiring the usage of a different, softer matching mechanism.

SLING (Mihindukulasooriya et al., 2020)
- Integrate abstract meaning representation to increase question understanding
AdvT-MMRD (Zhang et al., 2020)
- Use semantic and literal question-relation matching and incorporate entity type information with adversarial training
MLTA (Wang et al., 2019)
- Similarity computation between the question and relation candidates on multiple levels using an attention mechanism
Support Sentences (Li et al., 2017)
- Enrich candidate pairs with support sentences from an external source
Question Matching (Abolghasemi et al., 2020)
- Find the most matching question to the input question

Incomplete Knowledge Graphs

One of the most known problems in KGQA is that KGs are incomplete (Min et al., 2013), i.e. certain relations or entities are missing, which is natural considering how vast and complex the body of human knowledge is (and that it keeps growing daily). This problem is especially evident in highly technical and specialised areas.

OPQL (Sun et al., 2021)
- Construct a virtual knowledge base
MINERVA (Das et al., 2018)
- Infer missing knowledge using RL
Reward Shaping (Lin et al., 2018)
- Improve reward mechanism of MINERVA
ROP (Yin et al., 2018)
- Predict KG paths using an RNN to infer new information

Disambiguation Problem

A difficult challenge for QA systems to overcome is the ambiguity of natural language. The problem here is, that certain relations may have the same name but a different meaning depending on the context. An example on a KB level (taken from Hsiao et al., 2017) would be the Freebase relation genre which both appears in the context of film.film.genre as well as music.artist.genre.

Relation Splitting (Hsiao et al., 2017)
- Further split a relation into its type and property
KSA-BiGRU (Zhu et al., 2019)
- Computing a probability distribution for every relation
Alias Matching (Buzaaba and Amagasa, 2021)
- Match alias from question with KB and pick most likely relation
EARL (Dubey et al., 2018)
- Perform entity and relation linking jointly
HR-BiLSTM (Yu et al., 2017)
- Use an hierarchical BiLSTM model and entity re-ranking

Noise From Distant Supervision

In some domains, training data is sparse and typically involves manual human labour to annotate correctly. This process is, however, very time consuming and therefore not scalable. To overcome this problem, distant supervision (DS) was proposed, which is able to automatically generate training data. The problem with using DS is that the resulting training data can be very noisy, which in turn degrades the model's performance when trained on that data.

ReQuest (Wu et al., 2018)
- Use indirect supervision from external QA corpus
Deep RL (Qin et al., 2018)
- Use a policy-based RL agent to find false positives

Inclusion of Structured Information From Subgraphs

The main idea of this research challenge is that subgraphs - either generated from the input query or from a KB using the input query - contain useful structural information. This structural information could be leveraged to perform KGQA more accurately.

RDAS (Wang et al., 2021)
- Incorporate information direction within reasoning
GGNN for SP (Sorokin and Gurevych, 2018)
- Integrate the structure of the semantic query
MHGRN (Feng et al., 2020)
- Capture relations between entities using a Graph Relation Network

Hybrid Question-Answering

The hybrid QA challenge involves answering question while not only referring to a KB but also use knowledge from external, often natural language textual sources. This can be especially helpful in domains, in which knowledge is not readily available in triplet form. This challenge overlaps with the Incomplete KG challenge.

HCqa (Asadifar et al., 2019)
- Extract knowledge from text using linguistic patterns
QARDTE (Zheng et al., 2018)
- NN with attention mechanism to extract features from unstructured text based on the input question to be used during candidate re-ranking
HybQA (Mohamed et al., 2017)
- Filter answers using Wikipedia as external source

New and Unseen Domains

The authors (Sidiropoulos et al., 2020) define an unseen domain as a domain for which facts exist in a given KB/KG but are absent within the training data.

Representation Adapter (Wu et al., 2019)
- Use an adapter to map from general purpose representations to task specific ones (model-centric)
Synthetic Data (Sidiropoulos et al., 2020)
- Generation of synthetic training data (distant supervision) for new, unseen domains (data-centric)

Integration of Language Models for Relation Extraction

Pre-trained language models have learned knowledge in a more general sense, which means that they can struggle in situations in which structured or factual knowledge is required (Kassner and Schütze, 2020). Therefore, using language models alone for KGQA can lead to poor performance. However, leveraging language models with structural information from KGs can lead to better question understanding and increased accuracy (Yasunaga et al., 2021).

Transfer Learning with BERT (Lukovnikov et al., 2020)
- Use BERT to predict the relation of the input
QA-GNN (Yasunaga et al., 2021)
- Integrate QA context with KG subgraphs

Candidate Generation

Generating a set of relation candidates for an input query can be a very challenging task as it requires finding solutions for different problems such as finding the right candidates and limiting the candidate size. Furthermore, it is necessary to rank the candidates correctly in order to retrieve the correct answer. The following research addresses these problems.

UHOP (Chen et al., 2019)
- Lifting the limit of hops without increasing the candidate set's size
Incremental Sequence Matching (Lan et al., 2019)
- Iterative candidate path generation and pruning
Retrieve and Re-rank (Wang et al., 2021)
- Create an inverted index and create a candidate set using the TF-IDF algorithm and rank the candidates using BERT

Low Relation Extraction Accuracy

The goal of the following research is to increase the accuracy of RE.

Multi-View Matching (Yu et al., 2018)
- Match the input question to multiple views from the KG to capture more information
Masking Mechanism (Chen et al., 2018)
- Set a hop limit of 2 to hide far away relations, which might be irrelevant

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
scanner.py		scanner.py

License

semantic-systems/RE-for-KGQA-survey

Folders and files

Latest commit

History

Repository files navigation