Visualize LLM Evaluations for OpenAI Assistants
-
Updated
Mar 27, 2024 - TypeScript
Visualize LLM Evaluations for OpenAI Assistants
A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
A prompt collection for testing and evaluation of LLMs.
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.
Summary Evaluation Tool
EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
Official implementation for the paper *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
Typescript SDK for the prompt engineering, prompt management, and prompt testing tool Prompt Foundry
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
LLMs Evaluation
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
Calibration game is a game to get better at identifying hallucination in LLMs.
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."