Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
-
Updated
Jun 7, 2024 - Python
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Mechanistically interpretable neurosymbolic AI (Nature Comput Sci 2024): losslessly compressing NNs to computer code and discovering new algorithms which generalize out-of-distribution and outperform human-designed algorithms
Interpreting how transformers simulate agents performing RL tasks
🧠 Starter templates for doing interpretability research
Sparse and discrete interpretability tool for neural networks
Sparse probing paper full code.
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
Explain a black-box module in natural language.
Steering vectors for transformer language models in Pytorch / Huggingface
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Universal Neurons in GPT2 Language Models
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research. Open-sourced and constantly updated.
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
🦠 DeepDecipher: An open source API to MLP neurons
CoSy: Evaluating Textual Explanations
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals
A mechanistic interpretability study invvestigating a sequential model trained to play the board game Othello
graphpatch is a library for activation patching on PyTorch neural network models.
This repository contains the code used for the experiments in the paper "Discovering Variable Binding Circuitry with Desiderata".
Add a description, image, and links to the mechanistic-interpretability topic page so that developers can more easily learn about it.
To associate your repository with the mechanistic-interpretability topic, visit your repo's landing page and select "manage topics."