BlackLab Frontend, a feature-rich corpus search interface for BlackLab.
-
Updated
May 28, 2024 - TypeScript
BlackLab Frontend, a feature-rich corpus search interface for BlackLab.
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
📑 Galician corpus for misogyny detection
Thai News Dataset from Thai government website.
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
This repository houses a comprehensive collection of 14,701 Instagram posts authored by Italian university students between January 2020 and December 2020. These posts offer invaluable insights into the experiences and reflections of students during the challenging period of the COVID-19 lockdown in Italy.
A collection of encoded archival description XML documents for text and content analysis.
Estonian Grammatical Error Correction (GEC) test and development corpus that contains L2 learner texts error-annotated in the M2 format.
Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
A very simple news crawler with a funny name
Linguistic search for large annotated text corpora, based on Apache Lucene
A parser for annotated MuseScore 3 files.
Voice activity detection and speaker gender segmentation audiovisual corpus
Radio Audio Corpus Collection Toolkit with Hackrf One.
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."