hipamod (HIghly PArallel MOdel Deployment)

Large Language Models (LLMs) are becoming more common, which is cool, however, they require huge amounts of computing resources to be deployed (see the metaseq api for some reference). What if we could run LLMs using only a few laptops?

Goal

This project seeks to determine two things:

to clarify the problem: how to perform a hardware-dependent scaling analysis for deployment of LLMs?
to solve it if it is solvable (based on the scaling analysis), otherwise prove that it is not solvable.

Basically, I'm aware that it's impossible that using CPUs from PCs can compete with TPU/GPU clusters, however, if we're able to achieve even 3% of efficiency compared to the more expensive setups, that still might be very helpful.

Compress 100x bigger models, to make them 100x smaller, and run them 100x faster.

In short, running opt-175B only on distributed CPUs is the mission.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data/lambada		data/lambada
latency		latency
microk8s		microk8s
scripts		scripts
.gitignore		.gitignore
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
NOTES.md		NOTES.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/lambada

data/lambada

latency

latency

microk8s

microk8s

scripts

scripts

.gitignore

.gitignore

DESIGN.md

DESIGN.md

Dockerfile

Dockerfile

NOTES.md

NOTES.md

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

hipamod (HIghly PArallel MOdel Deployment)

Goal

About

Releases

Packages

Languages

hulsedev/hipamod

Folders and files

Latest commit

History

Repository files navigation

hipamod (HIghly PArallel MOdel Deployment)

Goal

About

Resources

Stars

Watchers

Forks

Languages