#

data-cleaning

Here are 2,795 public repositories matching this topic...

justmarkham / pandas-videos

Jupyter notebook and datasets from the pandas video series

python data-science tutorial jupyter-notebook pandas data-analysis data-cleaning

Updated Mar 5, 2024
Jupyter Notebook

justmarkham / DAT8

General Assembly's 2015 Data Science course in Washington, DC

Updated Oct 6, 2022
Jupyter Notebook

cleanlab / cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Updated May 23, 2024
Python

fiftyone

voxel51 / fiftyone

The open-source tool for building high-quality datasets and computer vision models

visualization python data-science machine-learning computer-vision deep-learning artificial-intelligence developer-tools image-classification object-detection data-cleaning active-learning data-quality data-curation unstructured-data vector-search data-centric-ai

Updated May 23, 2024
Python

unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

testing schema validation data-validation pandas-dataframe assertions pandas testing-tools data-processing dataframes data-cleaning hypothesis-testing data-verification pandas-validation data-check data-assertions dataframe-schema pandas-validator

Updated May 23, 2024
Python

hi-primus / optimus

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

data-science machine-learning spark bigdata data-transformation pyspark data-extraction data-analysis data-wrangling dask data-exploration data-preparation data-cleaning data-profiling data-cleansing big-data-cleaning data-cleaner cudf dask-cudf

Updated May 20, 2024
Python

miller

johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Updated May 21, 2024
Go

LibraryCarpentry / lc-open-refine

Library Carpentry: OpenRefine

openrefine english stable lesson data-management library-carpentry data-cleaning carpentries

Updated May 21, 2024

sfirke / janitor

simple tools for data cleaning in R

data-science r excel spss tidyverse pivot-tables data-analysis data-cleaning dirty-data tabulations

Updated May 23, 2024
R

datacarpentry / OpenRefine-ecology-lesson

Data Cleaning with OpenRefine for Ecologists

openrefine english stable lesson data-management ecology data-cleaning data-carpentry carpentries

Updated May 21, 2024

rasgointelligence / feature-engineering-tutorials

Data Science Feature Engineering and Selection Tutorials

python data-science machine-learning tutorial jupyter notebook scikit-learn exploratory-data-analysis tutorials pandas feature-selection xgboost feature-engineering features data-cleaning pandas-profiling sweetviz pyrasgo

Updated May 22, 2024
Jupyter Notebook

skrub-data / skrub

Prepping tables for machine learning

data-science data machine-learning data-analysis data-wrangling data-preprocessing data-preparation data-cleaning dirty-data

Updated May 23, 2024
Python

ajaymache / data-analysis-using-python

Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊

data-science exploratory-data-analysis eda data-visualization kaggle-competition data-analytics data-analysis data-wrangling data-cleaning kaggle-dataset data-cleansing data-science-python data-analysis-python kaggle-used-cars-dataset

Updated Jan 2, 2019
Jupyter Notebook

jim-schwoebel / voicebook

🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

visualization security data machine-learning server voice python3 voice-recognition generation transcription voice-control data-cleaning voice-assistant encryption-decryption voice-recording voice-activity-detection wake-word-detection featurization voice-computing

Updated Dec 8, 2022
Python

data-forge / data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

visualization nodejs javascript linq json data csv pandas data-visualization data-analysis data-wrangling data-management data-manipulation data-cleaning data-munging data-cleansing data-forge

Updated Mar 13, 2024
TypeScript

ECNU-ICALK / EduChat

An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型，GPU部署，数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM

education chinese-nlp llama data-cleaning moss open-models belle llm

Updated Dec 25, 2023
Python

desbordante-core

Desbordante / desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

data-science data-mining exploratory-data-analysis tabular-data feature-selection data-engineering feature-extraction data-analytics knowledge-discovery data-wrangling data-preprocessing feature-engineering spreadsheets data-exploration data-mining-algorithms data-cleaning data-profiling anomaly-detection data-cleansing correlations

Updated May 23, 2024
C++

klib

akanz1 / klib

Easy to use Python library of customized functions for cleaning and analyzing data.

python data-science data-visualization feature-selection data-analysis klib data-preprocessing data-cleaning

Updated May 21, 2024
Python

datacarpentry / openrefine-socialsci

OpenRefine for Social Science Data

openrefine english stable lesson data-management social-sciences data-cleaning hacktoberfest data-carpentry carpentries

Updated May 21, 2024

schema-inspector / schema-inspector

Schema-Inspector is a simple JavaScript object sanitization and validation module.

javascript sanitization validation data-cleaning

Updated Mar 12, 2024
JavaScript

Improve this page

Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."