llm-evaluation

Star

Here are 62 public repositories matching this topic...

euskoog / openai-assistants-evals

Star

Visualize LLM Evaluations for OpenAI Assistants

openai tailwindcss llms llm-evaluation openai-assistants

Updated Mar 27, 2024
TypeScript

awesome-software / ray-summit-2023-training

Star

llm-evaluation

Updated Sep 21, 2023
Jupyter Notebook

nagababumo / Automated-Testing-for-LLMOps

Star

automation evaluation llm llmops llm-evaluation llm-automation

Updated Jun 4, 2024
Jupyter Notebook

j0st / PoliticalLLM

Star

A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.

german pct manifesto-project rag wahlomat political-ideology-detection llms llm-evaluation

Updated May 1, 2024
Python

kwinkunks / promptly

Star

A prompt collection for testing and evaluation of LLMs.

prompts prompt-engineering chatgpt llm-evaluation

Updated Jun 5, 2024
Jupyter Notebook

gretelai / navigator-helpers

Star

Navigator Helpers

ai agent-based synthetic-data llm llm-evaluation

Updated Jun 10, 2024
Python

rochitasundar / Generative-AI-with-Large-Language-Models

Star

This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".

reinforcement-learning transformer kl-divergence proximal-policy-optimization large-language-models prompt-engineering flan-t5 instruction-finetuning low-rank-adaptation reward-model parameter-efficient-fine-tuning llm-evaluation

Updated Dec 1, 2023
Jupyter Notebook

IteraLabs / knowledge-benchmarks

Star

A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.

nlp artificial-intelligence benchmarks natural-language-understanding llm llm-evaluation

Updated May 18, 2024

innerNULL / summary-evaluator

Star

Summary Evaluation Tool

nlp deep-learning text-summarization model-evaluation model-evaluation-metrics llm bertscore llm-evaluation

Updated Jun 3, 2024
Python

EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.

python benchmark knapsack huggingface streamlit large-language-models llm llm-evaluation open-llm-leaderboard