synthetic-dataset-generation

Here are 199 public repositories matching this topic...

privateai / deid-examples

Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.

Updated May 30, 2024
Jupyter Notebook

argilla-io / distilabel

Star

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated May 30, 2024
Python

neel-dey / AnyStar

Star

[WACV 2024] AnyStar: Domain randomized universal star-convex 3D instance segmentation

medical-imaging generative-model segmentation microscopy instance-segmentation synthetic-data synthetic-dataset-generation domain-randomization

Updated May 29, 2024
Python

davanstrien / awesome-synthetic-datasets

Star

awesome synthetic (text) datasets

ai awesome-list datasets synthetic-data synthetic-dataset-generation llms

Updated May 29, 2024
Jupyter Notebook

The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. This repository is actively maintained, and new features are continuously being added.

biases synthetic-dataset-generation layoutlm synthetic-dataset layoutxlm token-classification layoutlmv3 layoutlmv2 llms-benchmarking

Updated May 29, 2024
Python

ucl-cssb / MIMIC

Star

Modelling and Inference of MICrobiomes Project (MIMIC) is a Python package dedicated to simulate, model, and predict microbial communities interactions

biology synthetic-biology modeling computational-biology microbial-communities microbiome regression-models consortium synthetic-dataset-generation

Updated May 30, 2024
Python

wlinds / tpme

Star

This Person Might Exist - Synthetic dataset generation

synthetic-dataset-generation

Updated May 28, 2024
Jupyter Notebook

inductiva / inductiva

Star

Large scale simulations made simple.

python api open-source simulation hpc molecular-dynamics fluid-dynamics synthetic-dataset-generation coastal-dynamics dualsphysics reef3d splishsplash

Updated May 29, 2024
Python

smutnyjw / training_cv_models_on_ue_images_jws

Star

This repository contains documents and source code related to John W. Belanger Smutny's Viriginia Tech Master's of Engineering senior capstone project on "The Effect of Training Published Computer Vision Models on Unreal Engine Synthetic Images". An Introduction to the Synergy between Graphics Rendering Software and Machine Learning.

synthetic-images computervision unreal-engine-4 synthetic-dataset-generation

Updated May 27, 2024
Python

DanieleBertagnoli / SyntheticVideoGeneration

Star

This project allows users to generate synthetic videos from CAD models, including .npy files with additional information. Models are loaded dynamically into a Blender scene, and the camera smoothly moves along spherical points to create the final video.

video blender-scripts blender blender-3d blender-python synthetic-data cad-models synthetic-dataset-generation ycb ycb-video

Updated May 27, 2024
Python

Jns-M / patGAN

Star

Generate synthetic clinical study data in the form of individual patients.

python generative-adversarial-network gan clinical-trials synthetic-data synthetic-dataset-generation clinical-study framingham-heart-study

Updated May 23, 2024
Python

MichiganNLP / depression_synthetic_data

Star

Can LMs generate useful synthetic data for the mental health domain?

mental-health synthetic-dataset-generation depression-analysis llm

Updated May 20, 2024
Jupyter Notebook

Eladlev / AutoPrompt

Star

A framework for prompt tuning using Intent-based Prompt Calibration

synthetic-dataset-generation prompt-tuning prompt-engineering

Updated May 17, 2024
Python

kevinscaria / TarGEN

Star

Targeted Data Generation with Large Language Models

nlp alignment nlp-machine-learning model-alignment synthetic-dataset-generation datagen datageneration large-language-models llm chatgpt

Updated May 15, 2024
Jupyter Notebook

dieterich-lab / ASyH

Star

The Anonymous Synthesizer for Health Data

gaussian-mixture-models health-data variational-autoencoder anonymization synthetic-data synthetic-dataset-generation ml-pipeline ctgan synthetic-data-vault

Updated May 13, 2024
Python

clugen / pyclugen

Star

Multidimensional cluster generation in Python

python python-library python3-library multidimensional-data synthetic-clusters synthetic-dataset-generation synthetic-data-generator multidimensional-clusters

Updated May 12, 2024
Python

aanchal898 / CityZen-Automating-govt-citizen-issue-reporting

Star

To enhance the interaction between local governments and the communities they represent, we leveraged the power of AI to simplify the complaint management process for government entities.

clustering classification topic-modeling synthetic-dataset-generation sbert