Skip to content

Licenses for datasets

Ricardo Usbeck edited this page Mar 23, 2023 · 1 revision

ACE2004

Task Type License Language
A2KB news LDC en

AIDA/CoNLL

Task Type License Language
A2KB news CoNLL Licence en
gerbil_data/datasets/aida/AIDA-YAGO2-dataset-update.tsv

The adapter is working with the original AIDA-YAGO2-dataset.tsv file as well. The differences between the original and the updated file seem to be the replacement of YAGO URL paths with IDs. However, our adapter does not use these values.

AQUAINT

Task Type License Language
A2KB news LDC User Agreement for Non-Members en
  • https://catalog.ldc.upenn.edu/LDC2002T31
  • Graff, D. 2002. The AQUAINT corpus of English news text. Technical report, Linguistic Data Consortium, Philadelphia, PA, USA.
  • This dataset is not included in gerbil_data.zip. Installation: the implemented adapter for the AQUAINT dataset expects the following folders
gerbil_data/datasets/AQUAINT/RawTexts
gerbil_data/datasets/AQUAINT/Problems

DBpedia Spotlight Corpus

Task Type License Language
A2KB news CC BY 4.0 en

Derczynski

Task Type License Language
A2KB microposts CC BY 4.0 en

IITB

Task Type License Language
A2KB mixed Public Domain en

KORE 50 (NIF)

Task Type License Language
A2KB news CC BY 4.0 en
  • http://www.yovisto.com/labs/ner-benchmarks/
  • J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum, \KORE & Keyphrase Overlap Relatedness for Entity Disambiguation," presented at the Proceedings of the 21set ACM International Conference on Information and Knowledge Management, CIKM 2012, Hawaii, USA, 2012.
  • This dataset is already included in gerbil_data.zip

Microposts2013

Task Type License Language
RT2KB microposts CC BY-NC-SA 3.0 en
gerbil_data/datasets/microposts2013/goldStandard.tsv
gerbil_data/datasets/microposts2013/testSet.tsv
gerbil_data/datasets/microposts2013/TweetsTrainingSetCH.tsv

Microposts2014

Task Type License Language
A2KB microposts Twitter license en
gerbil_data/datasets/microposts2014/Microposts2014-NEEL_challenge_TweetsTestSet.csv
gerbil_data/datasets/microposts2014/Microposts2014-NEEL_challenge_TweetsTrainingSet.csv

Microposts2015

Task Type License Language
A2KB microposts CC BY 4.0 en
  • Needs to be added to gerbil_data.zip
  • This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2015/dev/NEEL2015-dev-gold_v3.tsv
gerbil_data/datasets/microposts2015/dev/NEEL2015-dev-tweets.tsv
gerbil_data/datasets/microposts2015/test/NEEL2015-test-gold_v2.tsv
gerbil_data/datasets/microposts2015/test/NEEL2015-test-tweets.tsv
gerbil_data/datasets/microposts2015/training/NEEL2015-training-gold_v4.ts
gerbil_data/datasets/microposts2015/training/NEEL2015-training-tweets_v2.tsv

Microposts2016

Task Type License Language
A2KB microposts CC BY 4.0 en
  • Needs to be added to gerbil_data.zip
  • This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2016/Dev Set/NEEL2016-dev.tsv
gerbil_data/datasets/microposts2016/Dev Set/NEEL2016-dev_neel.gs
gerbil_data/datasets/microposts2016/Test Set/NEEL2016-test.tsv
gerbil_data/datasets/microposts2016/Test Set/NEEL2016-test_neel.gs
gerbil_data/datasets/microposts2016/Training Set/NEEL2016-training.tsv
gerbil_data/datasets/microposts2016/Training Set/NEEL2016-training_neel.gs

MSNBC

Task Type License Language
A2KB news - en

N3-Reuters128

Task Type License Language
A2KB news CC-by-SA-NC 4.0 International License en

N3-RSS500

Task Type License Language
A2KB RSS-feeds CC-by-SA-NC 4.0 International License en

Ritter

Task Type License Language
RT2KB news GNU v3 en

Senseval 2

Task Type License Language
ERec mixed Public Domain en

Senseval 3

Task Type License Language
ERec mixed Public Domain en

TwitterNEED

Brian Collection

Task Type License Language
A2KB microposts (Twitter) CC-BY(?) en
  • Locke, B. and Martin, J. (2009). Named entity recognition: Adapting to microblogging. Senior Thesis, University of Colorado.

Mena Collection

Task Type License Language
A2KB microposts (Twitter) CC-BY(?) en
  • Habib, M. B. and van Keulen, M. (2012). Unsupervised improvement of named entity extraction in short informal context using disambiguation clues. In Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), pages 1–10.

UMBC

Task Type License Language
RT2KB news BSD 2 en

WSDM

Task Type License Language
C2KB microposts - en