text-extraction

Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.

metadata text-extraction full-text full-text-search ravendb ediscovery indexing-engine file-format-detection data-breach file-deduplication pii information-governance-catalog personally-identifiable-information archive-extractor pii-detection file-identification full-text-extraction document-ingestion information-governance

Updated May 28, 2024

nguyen-tho / ID-card-extract-module

Star

deep-learning text-extraction id-card transformer-ocr

Updated May 25, 2024
Python

flairNLP / fundus

Star

A very simple news crawler with a funny name

python nlp rss sitemap crawler scraper corpus text-extraction web-scraping news-crawler commoncrawl web-corpus news-scraping cc-news

Updated May 23, 2024
Python

abhinaba-ghosh / any-text

Star

Get text content from any file

text text-extraction reader file-reader text-extractor

Updated May 18, 2024
JavaScript

zanachka / extruct

Star

Extract embedded metadata from HTML markup

text-extraction html-extraction

Updated May 17, 2024
Python

MRGRD56 / textractor-translator

Star

Translate visual novels in real time

electron javascript games translator typescript translation anime text-extraction visual-novel textractor textractor-extension

Updated May 17, 2024
TypeScript

miso-belica / sumy

Sponsor

Star

Module for automatic summarization of text documents and HTML pages.

python nlp pagerank-algorithm text-extraction reduction summarization html-page summary lsa sumy textteaser summarizer html-extraction html-extractor

Updated May 16, 2024
Python

yasminsarkhosh / machine-learning-bsc-thesis-2024

Star

This GitHub repository hosts the notebooks and tools developed as part of this thesis to automate the extraction, processing, and analysis of data from the MICCAI 2023 conference, aiding in the systematic review and providing a structured foundation for further research in this crucial area.

data-science machine-learning data-visualization text-extraction artificial-intelligence healthcare medical-imaging data-analysis datasets annotation-framework data-quality demographic-analysis medical-image-processing miccai pdf-data-extraction medical-ai healthcare-ai miccai2023 medical-ai-project

Updated May 15, 2024
Jupyter Notebook

TYPO3-Solr / ext-tika

Star

A TYPO3 CMS extension that provides Apache Tika functionality

search php metadata cms cms-extension tika language-detection typo3 typo3-cms-extension file-indexing text-extraction

Updated May 16, 2024
PHP

edhou20 / Medical-Texts-NLP-Clustering-

Star

nlp clustering text-extraction dimensionality-reduction vectorization unsupervised-learning

Updated May 13, 2024
Python

real0x0a1 / ocr-opencv

Star

OCR with Tesseract and OpenCV: Extract text from images effortlessly. Preprocess with OpenCV for accuracy. Display results and save output. Easy integration for document digitization and data entry automation.

python opencv machine-learning automation ocr image-processing tesseract text-extraction document-digitization data-entry-automation