Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
Updated
Jun 12, 2024 - HTML
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
a simple and functional multi convert system using amount of python librarys
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
C# and VB.NET samples for Docotic.Pdf library
cli for extracting text from PDF files (and maybe possibly tables)
This is a Python application that converts non-readable PDF files, such as scanned documents, into readable Word documents. It achieves this by first converting the PDF files into images and then extracting the text from the images to create the Word documents. The application provides a user-friendly interface to do the above task.
Converts a pdf file into a text file while keeping the layout of the original pdf.
Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library
Sample code for the Datalogics .NET interface of the Adobe PDF Library
PDF text data extraction web app with OCR for scanned documents
Sample code for the Datalogics Java interface of the Adobe PDF Library setup to build with Maven
io for nocodefunctions: csv, txt, pdf, and xlsx so far
Sample code for the Datalogics C++ interface of the Adobe PDF Library
Aspose.PDF for Javascript via C++
Python script to translate a PDF file to DOCX or ODT
The code base of the front-end of nocodefunctions.com
Build a RAG preprocessing pipeline
Python project that converts tables inside PDFs to CSV for convenient data manipulation. It has log and exception handling.
Convert PDFs to text, then transform that text into structured JSON objects for Threat Intelligence.
Extract structured text and data from documents like invoices, book pages, tables, etc.. using OpenCV and Tesseract OCR
Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.
To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."