Skip to content

Reweight GPT - a simple neural network using transformer architecture for next character prediction

License

Notifications You must be signed in to change notification settings

hunar4321/reweight-gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub repo size GitHub

Reweight GPT

An alternative to the self-attetnion mechanism in Tranformer achitechture. It uses learnable lateral connections to reweight the inputs directly instead of the self-attention mechanism (as illustrated below). To learn more about the method, watch this video (from 41:26): https://youtu.be/l-CjXFmcVzY

Files:

  1. the tutorial folder - A step by step tutorial from the basics to GPT.
  2. reweight-gpt.py (A multi-block GPT implimentation using direct re-weighting of the attention matrix).
  3. reweight-gpt-nonlinear.py (A nonlinear version of the direct re-weighting method. For easy comparsion between the two methods, I adapted this script directly from Andrej Karpathy's GPT implimentation).

Illustration:

About

Reweight GPT - a simple neural network using transformer architecture for next character prediction

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published