Multilingual text segmentation and translation using state-of-the-art language models. This project leverages the power of several advanced models from Hugging Face and others.
-
Updated
Jun 6, 2024 - Python
Multilingual text segmentation and translation using state-of-the-art language models. This project leverages the power of several advanced models from Hugging Face and others.
Python script for segmenting Chinese text into individual words and translating them into English
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
"WBSUBNdb_text: Bangla handwritten text document dataset" is a Bangla text dataset containing 1383 offline handwritten text documents contributed by 190 writers. The dataset is composed of both simple and compound characters.
A npm package designed specializing in Natural Language Processing, which develops AI systems that can understand and generate natural language.
Text Processing & Segmentation Framework
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Split text into chars, words, or sentences from the command line.
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Accelerated deep learning R&D
Automatic Manga Translator
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Text segmentation into separate words using a simple unigram model and the Viterbi algorithm
Java implementation of UAX#29 text segmentation algorithm
Repo for the paper "Grounded Complex Task Segmentation for Conversational Assistants" presented at SIGDIAL 2023
The work that was part of my Master's Thesis Project spring 2023
Tajik text segmentation algorithms
Add a description, image, and links to the text-segmentation topic page so that developers can more easily learn about it.
To associate your repository with the text-segmentation topic, visit your repo's landing page and select "manage topics."