data-quality

Star

Here are 300 public repositories matching this topic...

wosaku / data-profiling-mask-analyzer

Star

Python function to generate a mask analysis

python data-quality data-profiling mask-analysis mask-analyzer

Updated Jul 22, 2017
Jupyter Notebook

majyphi / spark-switchpoint

Star

Simple Spark wrapper for validating data

data spark hacktoberfest data-quality dataquality hacktoberfest2020

Updated Oct 17, 2020
Scala

interzoid / fullnamematchscore-go

Star

Generates a match score of two person names from 0-100, where 100 is the highest, on how closely two individual full names match. The scoring is based on a series of tests, algorithms, AI, and an ever-growing body of Machine Learning-based generated knowledge

data-science machine-learning ai scoring data-analytics data-quality data-assessment name-matching data-assets

Updated Sep 17, 2023
Go

grahman20 / FIMUS

Star

FIMUS imputes numerical and categorical missing values by using a data set’s existing patterns including co-appearances of attribute values, correlations among the attributes and similarity of values belonging to an attribute.

data-science data-mining correlation missing-data similarity-measures preprocessing data-cleaning data-quality data-cleansing missing-values missing-value-handling missing-data-imputation missing-value-imputation co-appearance

Updated Mar 24, 2023
HTML

ftoschi14 / DIQ-2023-Toschi-Spreafico

Star

Project for the "Data and Information Quality" course at Politecnico di Milano - AY 2023/2024 - Data Issues: Duplication, Variable Types - ML Task: Classification

polimi data-quality politecnico-di-milano diq data-and-information-quality data-pollution diq-project

Updated Feb 23, 2024
Jupyter Notebook

jmw86069 / jamma

Star

Jam MA-plots, volcano plots, other relevant genomics visualizations

visualization omics data-quality ma-plots

Updated Jun 26, 2023
R

bballamudi / Optimus

Star

🚚 Agile Data Science Workflows made easy with Pyspark

pyspark data-quality data-profiling

Updated Oct 27, 2019
Jupyter Notebook

gurol / dsprofiling

Star

DsProfiling – Dataset Profiling

data-science machine-learning sparsity big-data dataset density profiling malware-samples descriptive-statistics data-quality quantitative-analysis malware-detection data-quality-measurement

Updated Apr 1, 2019
R

miriamspsantos / data-typology

Star

Implementation of data typology for imbalanced datasets.

data-science machine-learning matlab data-quality imbalanced-data data-complexity imbalanced-learning meta-learning data-centric-ai data-centric-machine-learning

Updated Jun 4, 2023
MATLAB

lukasgabor / SDMs-affected-by-positional-uncertainty-in-occurrences-can-still-be-ecologically-interpretable

Star

This repository provides R scripts for reproducing virtual species generating, modeling species distribution and final figures related with published manuscript.

species-distribution-modelling data-quality data-assessment model-interpretability

Updated Jun 13, 2023
R

emersonleaojr / capgemini-aceleracao-pyspark

Star

Aceleração Pyspark Capgemini 2022

spark data-transformation pyspark data-engineering data-quality

Updated Apr 20, 2022
Jupyter Notebook

theidari / fraud_anomaly_detection

Star

This GitHub repository provides a comprehensive set of tools and algorithms for detecting fraud anomalies in various data sources. Fraudulent activities can have severe consequences, impacting businesses and individuals alike. With this repository, we aim to empower researchers with effective techniques to identify and prevent fraudulent behavior.

banking tableau data-quality fraud-detection anomaly-detection

Updated Aug 16, 2023
HTML

bruno-uy / dbt-data-quality

Star

Data quality checks in your dbt flow

data dbt data-quality dbt-tests dbt-packages

Updated Sep 15, 2023
Jupyter Notebook

a-chumagin / soda-contract-poc

Star

PoC for Soda Contracts against Vertica DB

soda data-quality data-governance data-contracts

Updated Mar 1, 2024
Python

hadarsharon / compars

Star

DataFrame comparison done right, powered by Rust with polars (AKA the bear-agnostic 🐻 🐼 🐨 🐻‍❄️ DataFrame comparison library)

python rust spark pandas pyspark data-engineering dataframe dataframes data-quality data-profiling koalas polars

Updated May 13, 2024
Python

data-integrations / data-profiler

Star

Profiles the fields to generate statistics on each column specified.

profiler cdap data-quality cdap-plugin

Updated Feb 16, 2023
Java

gurol / DsFeatFreqComp

Star

DsFeatFreqComp – Dataset Feature-Frequency Comparison R Package

visualization android package machine-learning r p-value comparison dataset datasets feature-engineering binary-classification data-quality quantitative-analysis malware-detection normal-distribution kruskal-wallis shapiro-wilk

Updated Dec 29, 2020
R

Haighton / KB_related_stuff

Star

Scripts I wrote at my job which could be helpful to others

qa python3 data-quality-checks data-quality alto-xml mets-xml

Updated Jul 22, 2020
Python

ymgan / data-fairy

Star

The guidelines to help you to manage your antarctic biodiversity data

data-management tips-and-tricks data-quality

Updated Nov 25, 2021
R

AlexLuevano / DataGovernance

Star

This is a tool developed in Python to assist with the data governance process, particularly during the migration project Mainframe>MDM>PIC. The team checks the integrity of the data and evaluate business rules are being fullfiled by synchronizing the data between the MDM platform and the current item information on Mainframe. This tool's purpose…

mdm data-quality data-governance

Updated Nov 24, 2021
Jupyter Notebook

Improve this page

Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-quality

Here are 300 public repositories matching this topic...

wosaku / data-profiling-mask-analyzer

majyphi / spark-switchpoint

interzoid / fullnamematchscore-go

grahman20 / FIMUS

ftoschi14 / DIQ-2023-Toschi-Spreafico

jmw86069 / jamma

bballamudi / Optimus

gurol / dsprofiling

miriamspsantos / data-typology

lukasgabor / SDMs-affected-by-positional-uncertainty-in-occurrences-can-still-be-ecologically-interpretable

emersonleaojr / capgemini-aceleracao-pyspark

theidari / fraud_anomaly_detection

bruno-uy / dbt-data-quality

a-chumagin / soda-contract-poc

hadarsharon / compars

data-integrations / data-profiler

gurol / DsFeatFreqComp

Haighton / KB_related_stuff

ymgan / data-fairy

AlexLuevano / DataGovernance

Improve this page

Add this topic to your repo