rlhf

Here are 116 public repositories matching this topic...

argilla-io / argilla

Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated May 29, 2024
Python

tatsu-lab / alpaca_eval

Star

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated May 29, 2024
Jupyter Notebook

hiyouga / LLaMA-Factory

Star

Unify Efficient Fine-Tuning of 100+ LLMs

Updated May 29, 2024
Python

princeton-nlp / SimPO

Star

SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models rlhf preference-alignment

Updated May 29, 2024
Python

Aligner2024 / aligner

Star

Achieving Efficient Alignment through Learned Correction

alignment aligner llm rlhf weak-to-strong

Updated May 29, 2024
Python

jianzhnie / LLamaTuner

Star

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated May 29, 2024
Python

InternLM / InternLM

Star

Official release of InternLM2 7B and 20B base and chat models. 200K context support

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated May 29, 2024
Python

jazelly / FinetuneLLMs

Star

Finetune an LLM, within a few clicks!

python mac ui ai llama train lora finetune sft llm rlhf

Updated May 29, 2024
JavaScript

log10-io / log10

Star

Python client library for improving your LLM app accuracy

python debugging ai monitoring evaluations feedback logging artificial-intelligence openai agents autonomous-agents fine-tuning llms rlhf llmops anthropic

Updated May 29, 2024
Python

argilla-io / distilabel

Star

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif