The SQL/Ibis powered sklearn of record linkage
-
Updated
May 30, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
The SQL/Ibis powered sklearn of record linkage
Коллекция готовых SQL запросов для PostgreSQL по часто возникающим задачам (получение и модификация данных, ускорение запросов, обслуживание БД)
Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.
Nightly builds of rustic, rustic_server, and rustic_scheduler
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
A batch manager that will deduplicate and batch requests for a certain data type made within a window. Useful to batch requests made from multiple react components that uses react-query
Deduplicating archiver with compression and authenticated encryption.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Prometheus Alertmanager
CLI utility to find near duplicate images and remove all but the best copy.
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
React hooks for data fetching
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
Welcome to TheAnimeScripter – the ultimate tool for video processing. Enjoy seamless video processing with our intuitive GUIs for Windows and After Effects.
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
EROFS documentation repo for https://erofs.docs.kernel.org
Fast, secure, efficient backup program
(Mirror repository) -- Poda is a tool to find similar and duplicate content between disconnected storage units. It can be used for connected ones too.
Created by Halbert L. Dunn
Released 1946