#

ocr

Here are 4,687 public repositories matching this topic...

ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

html pdf ocr table-of-contents excel html-parser docx documents doc scanned-documents txt document-analysis odt pdf-parser table-recognition docx-parser document-content-extraction logical-structure-extraction

Updated Jun 5, 2024
Python

InternLM / HuixiangDou

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

application ocr robot pipeline dsl chatbot wechat assistance lark multimodal rag llm

Updated Jun 5, 2024
Python

siphyshu / vitb-timetable-parser

🔎 Parse VITB timetable screenshots to csv/json

ocr table-extraction

Updated Jun 5, 2024
Jupyter Notebook

doctr

mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

ocr deep-learning pytorch text-recognition text-detection optical-character-recognition text-detection-recognition tensorflow2 document-recognition

Updated Jun 5, 2024
Python

tesseract-server

hertzg / tesseract-server

A small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utilizing the power of Google Tesseract.

api docker typescript ocr docker-compose containers rest-api docker-image container image-processing tesseract http-server hacktoberfest tesseract-server

Updated Jun 5, 2024
TypeScript

PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

ocr db crnn ocrlite chineseocr

Updated Jun 5, 2024
Python

jackfsuia / LLM-Data-Cleaner

用大模型批量处理数据，现支持各种大模型做OCR，支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, moonshot, PaddleOCR, OpenAI, Llava.

ocr dataset llm

Updated Jun 5, 2024
Python

docspell

eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.

nlp pdf spa ocr scala elm self-hosted webapp document stanford-corenlp dms document-management personal-document-system edms document-management-system docspell

Updated Jun 5, 2024
Elm

hiroi-sora / PaddleOCR-json

OCR离线图片文字识别命令行windows程序，以JSON字符串形式输出结果，方便别的程序调用。提供各种语言API。由 PaddleOCR C++ 编译。

ocr json-api paddlepaddle paddleocr

Updated Jun 5, 2024
C++

Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Updated Jun 5, 2024
HTML

LaTeX-OCR

lukas-blecher / LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

python machine-learning ocr latex deep-learning image-processing pytorch dataset transformer vit image2text im2text im2latex im2markup math-ocr vision-transformer latex-ocr

Updated Jun 5, 2024
Python

PaddlePaddle / Paddle2ONNX

ONNX Model Exporter for PaddlePaddle

ocr detection deploy classification paddlepaddle onnx onnxruntime ppocr picodet ppyoloe

Updated Jun 5, 2024
Python

RapidAI / RapidOCR

Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVION and PaddlePaddle.

ocr crnn openvino dbnet onnxruntime paddleocr chineseocr easyocr rapidocr

Updated Jun 5, 2024
Python

alicewish / MomoTranslator

Pure OpenCV comic translation tool

opencv ocr comic manga chinese-translation auto-translation pyqt6

Updated Jun 5, 2024
Python

HSR-Scanner

kel-z / HSR-Scanner

Scanner for exporting light cone, relic, and character data from Honkai: Star Rail to JSON format.

ocr scanner honkai mihoyo hoyoverse honkai-starrail star-rail starrail honkai-star-rail

Updated Jun 5, 2024
Python

scribeocr / scribeocr

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

ocr abbyy tesseract proofreading

Updated Jun 5, 2024
JavaScript

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

nlp machine-learning information-retrieval ocr deep-learning chatbot orchestration preprocessing pdf-to-text data-pipelines document-parser rag document-understanding table-structure-recognition llm llmops retrieval-augmented-generation

Updated Jun 5, 2024
Python

SciPhi-AI / R2R

Build and deploy a fully-featured, observable user-facing RAG backend in minutes.

search pdf machine-learning ocr deep-learning retrieval chatbot artificial-intelligence question-answering data-pipelines retrieval-systems large-language-models llm langchain llama-index retrieval-augmented-generation

Updated Jun 5, 2024
HTML

siyuan

siyuan-note / siyuan

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

electron markdown pdf ocr notebook s3 webdav self-hosted openai note-taking evernote anki knowledge-base obsidian pkm notion notes-app local-first chatgpt

Updated Jun 5, 2024
TypeScript

Umi-OCR

hiroi-sora / Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

ocr ocr-python paddleocr

Updated Jun 5, 2024
QML

Improve this page

Add a description, image, and links to the ocr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ocr topic, visit your repo's landing page and select "manage topics."