Releases · hpcaitech/ColossalAI

31 May 11:41

github-actions

v0.3.8

68359ed

Version v0.3.8 Release Today! Latest

Latest

What's Changed

Release

[release] update version (#5752) by Hongxin Liu

Fix/example

[Fix/Example] Fix Llama Inference Loading Data Type (#5763) by Yuanheng Zhao

Gemini

Merge pull request #5749 from hpcaitech/prefetch by botbw
Merge pull request #5754 from Hz188/prefetch by botbw
[Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
Merge pull request #5733 from Hz188/feature/prefetch by botbw
Merge pull request #5731 from botbw/prefetch by botbw
[gemini] init auto policy prefetch by hxwang
Merge pull request #5722 from botbw/prefetch by botbw
[gemini] maxprefetch means maximum work to keep by hxwang
[gemini] use compute_chunk to find next chunk by hxwang
[gemini] prefetch chunks by hxwang
[gemini]remove registered gradients hooks (#5696) by flybird11111

Chore

[chore] refactor profiler utils by hxwang
[chore] remove unnecessary assert since compute list might not be recorded by hxwang
[chore] remove unnecessary test & changes by hxwang
Merge pull request #5738 from botbw/prefetch by Haze188
[chore] fix init error by hxwang
[chore] Update placement_policy.py by botbw
[chore] remove debugging info by hxwang
[chore] remove print by hxwang
[chore] refactor & sync by hxwang
[chore] sync by hxwang

Bug

[bug] continue fix by hxwang
[bug] workaround for idx fix by hxwang
[bug] fix early return (#5740) by botbw

Bugs

[bugs] fix args.profile=False DummyProfiler errro by genghaozhe

Inference

[inference] Fix running time of test_continuous_batching (#5750) by Yuanheng Zhao
[Inference]Fix readme and example for API server (#5742) by Jianghai
[inference] release (#5747) by binmakeswell
[Inference] Fix Inference Generation Config and Sampling (#5710) by Yuanheng Zhao
[Inference] Fix API server, test and example (#5712) by Jianghai
[Inference] Delete duplicated copy_vector (#5716) by 傅剑寒
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) by yuehuayingxueluo
[Inference] Add example test_ci script by CjhHa1
[Inference] Fix bugs and docs for feat/online-server (#5598) by Jianghai
[Inference] resolve rebase conflicts by CjhHa1
[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) by Jianghai
[Inference] ADD async and sync Api server using FastAPI (#5396) by Jianghai
[Inference] Support the logic related to ignoring EOS token (#5693) by yuehuayingxueluo
[Inference]Adapt temperature processing logic (#5689) by yuehuayingxueluo
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) by Steve Luo
[Inference] Fix quant bits order (#5681) by 傅剑寒
[inference]Add alibi to flash attn function (#5678) by yuehuayingxueluo
[Inference] Adapt Baichuan2-13B TP (#5659) by yuehuayingxueluo

Feature

[Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
[Feature] qlora support (#5586) by linsj20

Example

[example] add profile util for llama by hxwang
[example] Update Inference Example (#5725) by Yuanheng Zhao

Colossal-inference

[Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao

Nfc

[NFC] fix requirements (#5744) by Yuanheng Zhao
[NFC] Fix code factors on inference triton kernels (#5743) by Yuanheng Zhao

Ci

[ci] Temporary fix for build on pr (#5741) by Yuanheng Zhao
[ci] Fix example tests (#5714) by Yuanheng Zhao

Sync

Merge pull request #5737 from yuanheng-zhao/inference/sync/main by Yuanheng Zhao
[sync] Sync feature/colossal-infer with main by Yuanheng Zhao
[Sync] Update from main to feature/colossal-infer (Merge pull request #5685) by Yuanheng Zhao
[sync] resolve conflicts of merging main by Yuanheng Zhao

Shardformer

[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
[Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
[shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Doc

[doc] Update Inference Readme (#5736) by Yuanheng Zhao

Fix/inference

[Fix/Inference] Add unsupported auto-policy error message (#5730) by Yuanheng Zhao

Lazy

[lazy] fix lazy cls init (#5720) by flybird11111

Misc

[misc] Update PyTorch version in docs (#5724) by binmakeswell
[misc] Update PyTorch version in docs (#5711) by Edenzzzz
[misc] Add an existing issue checkbox in bug report (#5691) by Edenzzzz
[misc] refactor launch API and tensor constructor (#5666) by Hongxin Liu

Colossal-llama

[Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li

Fix

[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
[Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
[Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao

Feat

[Feat]Inference RPC Server Support (#5705) by Runyu Lu

Hotfix

[hotfix] fix inference typo (#5438) by hugo-syn
[hotfix] fix OpenMOE example import path (#5697) by Yuanheng Zhao
[hotfix] Fix KV He...

Assets 2

27 Apr 11:00

github-actions

v0.3.7

4cfbf30

Version v0.3.7 Release Today!

What's Changed

Release

[release] update version (#5654) by Hongxin Liu
[release] grok-1 inference benchmark (#5500) by binmakeswell
[release] grok-1 314b inference (#5490) by binmakeswell

Hotfix

[hotfix] add soft link to support required files (#5661) by Tong Li
[hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
[hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
[hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
[hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

[news] llama3 and open-sora v1.1 (#5655) by binmakeswell

Lazyinit

[lazyinit] skip whisper test (#5653) by Hongxin Liu

Shardformer

[shardformer] refactor pipeline grad ckpt config (#5646) by Hongxin Liu
[shardformer] fix chatglm implementation (#5644) by Hongxin Liu
[shardformer] remove useless code (#5645) by flybird11111
[shardformer] update transformers (#5583) by Wang Binluo
[shardformer] fix pipeline grad ckpt (#5620) by Hongxin Liu
[shardformer] refactor embedding resize (#5603) by flybird11111
[shardformer] Sequence Parallelism Optimization (#5533) by Zhongkai Zhao
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) by Insu Jang
[shardformer] update colo attention to support custom mask (#5510) by Hongxin Liu
[shardformer]Fix lm parallel. (#5480) by flybird11111
[shardformer] fix gathering output when using tensor parallelism (#5431) by flybird11111

Fix

[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
[fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
[Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
[fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

[coloattention]modify coloattention (#5627) by flybird11111

Example

[example] llama3 (#5631) by binmakeswell
[example] update Grok-1 inference (#5495) by Yuanheng Zhao
[example] add grok-1 inference (#5485) by Hongxin Liu

Exampe

[exampe] update llama example (#5626) by Hongxin Liu

Feature

[Feature] Support LLaMA-3 CPT and ST (#5619) by Tong Li

Zero

[zero] support multiple (partial) backward passes (#5596) by Hongxin Liu

Doc

[doc] fix ColossalMoE readme (#5599) by Camille Zhong
[doc] update open-sora demo (#5479) by binmakeswell
[doc] release Open-Sora 1.0 with model weights (#5468) by binmakeswell

Devops

[devops] remove post commit ci (#5566) by Hongxin Liu
[devops] fix example test ci (#5504) by Hongxin Liu
[devops] fix compatibility (#5444) by Hongxin Liu

Shardformer, pipeline

[shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

[ColossalChat] Update RLHF V2 (#5286) by YeAnbang

Format

[format] applied code formatting on changed files in pull request 5510 (#5517) by github-actions[bot]

Full Changelog: v0.3.7...v0.3.6

Assets 2

07 Mar 15:38

github-actions

v0.3.6

8020f42

Version v0.3.6 Release Today!

What's Changed

Release

[release] update version (#5411) by Hongxin Liu

Colossal-llama2

[colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

[hotfix] fix stable diffusion inference bug. (#5289) by Youngon
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) by digger yu
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) by digger yu
[hotfix] fix typo change _descrption to _description (#5331) by digger yu
[hotfix] fix typo of openmoe model source (#5403) by Luo Yihang
[hotfix] fix sd vit import error (#5420) by MickeyCHAN
[hotfix] Fix wrong import in meta_registry (#5392) by Stephan Kölker
[hotfix] fix variable type for top_p (#5313) by CZYCW

Doc

[doc] Fix typo s/infered/inferred/ (#5288) by hugo-syn
[doc] update some translations with README-zh-Hans.md (#5382) by digger yu
[doc] sora release (#5425) by binmakeswell
[doc] fix blog link by binmakeswell
[doc] fix blog link by binmakeswell
[doc] updated installation command (#5389) by Frank Lee
[doc] Fix typo (#5361) by yixiaoer

Eval-hotfix

[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) by Dongruixuan Li

Devops

[devops] fix extention building (#5427) by Hongxin Liu

Example

[example]add gpt2 benchmark example script. (#5295) by flybird11111
[example] reuse flash attn patch (#5400) by Hongxin Liu

Workflow

[workflow] added pypi channel (#5412) by Frank Lee

Shardformer

[shardformer]gather llama logits (#5398) by flybird11111

Setup

[setup] fixed nightly release (#5388) by Frank Lee

Fsdp

[fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

[extension] hotfix jit extension setup (#5402) by Hongxin Liu

Llama

[llama] fix training and inference scripts (#5384) by Hongxin Liu

Full Changelog: v0.3.6...v0.3.5

Assets 2

23 Feb 08:46

github-actions

v0.3.5

adae123

Version v0.3.5 Release Today!

What's Changed

Release

[release] update version (#5380) by Hongxin Liu

Llama

Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
[llama] fix memory issue (#5371) by Hongxin Liu
[llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
[llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
[llama] add flash attn patch for npu (#5362) by Hongxin Liu
[llama] update training script (#5360) by Hongxin Liu
[llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu

Moe

[moe] fix tests by ver217
[moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
[moe] fix mixtral forward default value (#5329) by Hongxin Liu
[moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
[moe] support mixtral (#5309) by Hongxin Liu
[moe] update capacity computing (#5253) by Hongxin Liu
[moe] init mixtral impl by Xuanlei Zhao
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
[moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
[moe] merge moe into main (#4978) by Xuanlei Zhao

Lr-scheduler

[lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu

Eval

[eval] update llama npu eval (#5366) by Camille Zhong

Gemini

[gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
[gemini] gemini support extra-dp (#5043) by flybird11111
[gemini] gemini support tensor parallelism. (#4942) by flybird11111

Fix

[fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen

Checkpointio

[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

[Chat] fix sft loss nan (#5345) by YeAnbang

Extension

[extension] fixed exception catch (#5342) by Frank Lee

Doc

[doc] added docs for extensions (#5324) by Frank Lee
[doc] add llama2-13B disyplay (#5285) by Desperado-Jia
[doc] fix doc typo (#5256) by binmakeswell
[doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
[doc] SwiftInfer release (#5236) by binmakeswell
[doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
[doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
[doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
[doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
[doc] update pytorch version in documents. (#5177) by flybird11111
[doc] fix colossalqa document (#5146) by Michelle
[doc] updated paper citation (#5131) by Frank Lee
[doc] add moe news (#5128) by binmakeswell

Tests

[tests] fix t5 test. (#5322) by flybird11111

Accelerator

Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
[accelerator] fixed npu api by FrankLeeeee
[accelerator] init the accelerator module (#5129) by Frank Lee

Workflow

[workflow] updated CI image (#5318) by Frank Lee
[workflow] fixed oom tests (#5275) by Frank Lee
[workflow] fixed incomplete bash command (#5272) by Frank Lee
[workflow] fixed build CI (#5240) by Frank Lee

Feat

[feat] refactored extension module (#5298) by Frank Lee

Nfc

[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) by 李文军
[nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
[nfc] fix typo change directoty to directory (#5111) by digger yu
[nfc] fix typo and author name (#5089) by digger yu
[nfc] fix typo in docs/ (#4972) by digger yu

Hotfix

[hotfix] fix 3d plugin test (#5292) by Hongxin Liu
[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
[hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
[hotfix] removed unused flag (#5242) by Frank Lee
[hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) by Wenhao Chen
[hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
[hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang

Sync

Merge pull request #5278 from ver217/sync/npu by Frank Lee

Shardformer

[shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
[shardformer] llama support DistCrossEntropy (#5176) by flybird11111
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
[shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
[shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) by flybird11111
[ci] fix shardformer tests. (#5255) by flybird11111
[ci] fixed ddp test (#5254) by Frank Lee
[ci] fixed booster test (#5251) by Frank Lee

Npu

[npu] change device to accelerator api (#5239) by Hongxin Liu
[npu] use extension for op builder (#5172) by Xuanlei Zhao
[npu] support triangle attention for llama (#5130) by Xuanlei Zhao
[npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
[npu] add npu support for gemini and zero (#5067) by Hongxin Liu

Pipeline

[pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
[pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

[format] applied code formatting on changed files in pull request 5234 (#5235) by [github-actions[bot]](https://api.github.com/users/githu...

Assets 2

01 Nov 05:57

github-actions

v0.3.4

8993c8a

Version v0.3.4 Release Today!

What's Changed

Release

[release] update version (#4995) by Hongxin Liu

Pipeline inference

[Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
[Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
[Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia

Doc

[doc] add supported feature diagram for hybrid parallel plugin (#4996) by ppt0011
[doc]Update doc for colossal-inference (#4989) by Cuiqing Li (李崔卿)
Merge pull request #4889 from ppt0011/main by ppt0011
[doc] add reminder for issue encountered with hybrid adam by ppt0011
[doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) by flybird11111
Merge pull request #4858 from Shawlleyw/main by ppt0011
[doc] update slack link (#4823) by binmakeswell
[doc] add lazy init docs (#4808) by Hongxin Liu
Merge pull request #4805 from TongLi3701/docs/fix by Desperado-Jia
[doc] polish shardformer doc (#4779) by Baizhou Zhang
[doc] add llama2 domain-specific solution news (#4789) by binmakeswell

Hotfix

[hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
[hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
[hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
[hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
[hotfix] fix bug in sequence parallel test (#4887) by littsk
[hotfix] Correct several erroneous code comments (#4794) by littsk
[hotfix] fix norm type error in zero optimizer (#4795) by littsk
[hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing

Kernels

[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li

Inference

[Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
[Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
[inference] add reference and fix some bugs (#4937) by Xu Kai
[inference] Add smmoothquant for llama (#4904) by Xu Kai
[inference] add llama2 support (#4898) by Xu Kai
[inference]fix import bug and delete down useless init (#4830) by Jianghai

Test

[test] merge old components to test to model zoo (#4945) by Hongxin Liu
[test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
Merge pull request #4856 from KKZ20/test/model_support_for_low_level_zero by ppt0011
[test] modify model supporting part of low_level_zero plugin (including correspoding docs) by Zhongkai Zhao

Refactor

[Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li

Nfc

[nfc] fix some typo with colossalai/ docs/ etc. (#4920) by digger yu
[nfc] fix minor typo in README (#4846) by Blagoy Simandoff
[NFC] polish code style (#4799) by Camille Zhong
[NFC] polish colossalai/inference/quant/gptq/cai_gptq/init.py code style (#4792) by Michelle

Format

[format] applied code formatting on changed files in pull request 4820 (#4886) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4908 (#4918) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4595 (#4602) by github-actions[bot]

Gemini

[gemini] support gradient accumulation (#4869) by Baizhou Zhang
[gemini] support amp o3 for gemini (#4872) by Hongxin Liu

Kernel

[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu

Feature

[feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) by littsk
[feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen

Checkpointio

[checkpointio] hotfix torch 2.0 compatibility (#4824) by Hongxin Liu
[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) by Baizhou Zhang

Infer

[infer] fix test bug (#4838) by Xu Kai
[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841) by Yuanheng Zhao
[Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) (#4771) by Yuanheng Zhao

Chat

[chat] fix gemini strategy (#4698) by flybird11111

Misc

[misc] add last_epoch in CosineAnnealingWarmupLR (#4778) by Yan haixu

Lazy

[lazy] support from_pretrained (#4801) by Hongxin Liu

Fix

[fix] fix weekly runing example (#4787) by flybird11111

Full Changelog: v0.3.4...v0.3.3

Assets 2

22 Sep 10:30

github-actions

v0.3.3

4146f1c

Version v0.3.3 Release Today!

What's Changed

Release

[release] update version (#4775) by Hongxin Liu

Inference

[inference] chatglm2 infer demo (#4724) by Jianghai

Feature

[feature] add gptq for inference (#4754) by Xu Kai
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li

Bug

[bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
[bug] fix get_default_parser in examples (#4764) by Baizhou Zhang

Lazy

[lazy] support torch 2.0 (#4763) by Hongxin Liu

Chat

[chat]: add lora merge weights config (#4766) by Wenhao Chen
[chat]: update rm, add wandb and fix bugs (#4471) by Wenhao Chen

Doc

[doc] add shardformer doc to sidebar (#4768) by Baizhou Zhang
[doc] clean up outdated docs (#4765) by Hongxin Liu
Merge pull request #4757 from ppt0011/main by ppt0011
[doc] put native colossalai plugins first in description section by Pengtai Xu
[doc] add model examples for each plugin by Pengtai Xu
[doc] put individual plugin explanation in front by Pengtai Xu
[doc] explain suitable use case for each plugin by Pengtai Xu
[doc] explaination of loading large pretrained models (#4741) by Baizhou Zhang
[doc] polish shardformer doc (#4735) by Baizhou Zhang
[doc] add shardformer support matrix/update tensor parallel documents (#4728) by Baizhou Zhang
[doc] Add user document for Shardformer (#4702) by Baizhou Zhang
[doc] fix llama2 code link (#4726) by binmakeswell
[doc] add potential solution for OOM in llama2 example (#4699) by Baizhou Zhang
[doc] Update booster user documents. (#4669) by Baizhou Zhang

Shardformer

[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) by Baizhou Zhang
[shardformer] add custom policy in hybrid parallel plugin (#4718) by Xuanlei Zhao
[shardformer] update seq parallel document (#4730) by Bin Jia
[shardformer] update pipeline parallel document (#4725) by flybird11111
[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) by flybird11111
[shardformer] fix GPT2DoubleHeadsModel (#4703) by flybird11111
[shardformer] update shardformer readme (#4689) by flybird11111
[shardformer]fix gpt2 double head (#4663) by flybird11111
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) by flybird11111
[shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) by eric8607242

Misc

[misc] update pre-commit and run all files (#4752) by Hongxin Liu

Format

[format] applied code formatting on changed files in pull request 4743 (#4750) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4726 (#4727) by github-actions[bot]

Legacy

[legacy] clean up legacy code (#4743) by Hongxin Liu
Merge pull request #4738 from ppt0011/main by ppt0011
[legacy] remove deterministic data loader test by Pengtai Xu
[legacy] move communication and nn to legacy and refactor logger (#4671) by Hongxin Liu

Kernel

[kernel] update triton init #4740 (#4740) by Xuanlei Zhao

Example

[example] llama2 add fine-tune example (#4673) by flybird11111
[example] add gpt2 HybridParallelPlugin example (#4653) by Bin Jia
[example] update vit example for hybrid parallel plugin (#4641) by Baizhou Zhang

Hotfix

[hotfix] Fix import error: colossal.kernel without triton installed (#4722) by Yuanheng Zhao
[hotfix] fix typo in hybrid parallel io (#4697) by Baizhou Zhang

Devops

[devops] fix concurrency group (#4667) by Hongxin Liu
[devops] fix concurrency group and compatibility test (#4665) by Hongxin Liu

Pipeline

[pipeline] set optimizer to optional in execute_pipeline (#4630) by Baizhou Zhang

Full Changelog: v0.3.3...v0.3.2

Assets 2

06 Sep 15:42

github-actions

v0.3.2

9709b8f

Version v0.3.2 Release Today!

What's Changed

Release

[release] update version (#4623) by Hongxin Liu

Shardformer

Merge pull request #4612 from hpcaitech/feature/shardformer by Hongxin Liu
[shardformer] update shardformer readme (#4617) by flybird11111
[shardformer] Add overlap optional for HybridParallelPlugin (#4615) by Bin Jia
[shardformer] update bert finetune example with HybridParallelPlugin (#4584) by flybird11111
[shardformer] Pytree fix (#4533) by Jianghai
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) by Baizhou Zhang
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) by Baizhou Zhang
[shardformer] fix submodule replacement bug when enabling pp (#4544) by Baizhou Zhang
[shardformer] support pp+tp+zero1 tests (#4531) by flybird11111
[shardformer] fix opt test hanging (#4521) by flybird11111
[shardformer] Add overlap support for gpt2 (#4535) by Bin Jia
[shardformer] fix emerged bugs after updating transformers (#4526) by Baizhou Zhang
[shardformer] zero1+pp and the corresponding tests (#4517) by Jianghai
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) by Baizhou Zhang
[shardformer] opt fix. (#4514) by flybird11111
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) by flybird11111
[shardformer] tests for 3d parallel (#4493) by Jianghai
[shardformer] chatglm support sequence parallel (#4482) by flybird11111
[shardformer] support tp+zero for shardformer (#4472) by Baizhou Zhang
[shardformer] Pipeline/whisper (#4456) by Jianghai
[shardformer] bert support sequence parallel. (#4455) by flybird11111
[shardformer] bloom support sequence parallel (#4465) by flybird11111
[shardformer] support interleaved pipeline (#4448) by LuGY
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) by Baizhou Zhang
[shardformer] fix import by ver217
[shardformer] fix embedding by ver217
[shardformer] update bloom/llama/vit/chatglm tests (#4420) by flybird11111
[shardformer]update t5 tests for using all optimizations. (#4407) by flybird11111
[shardformer] update tests for all optimization (#4413) by flybird11111
[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) by Baizhou Zhang
[shardformer]fix, test gpt2 for AMP+TP (#4403) by flybird11111
[shardformer] test all optimizations (#4399) by flybird1111
[shardformer] update shardformer to use flash attention 2 (#4392) by flybird1111
[Shardformer] Merge flash attention branch to pipeline branch (#4362) by flybird1111
[shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) by Baizhou Zhang
[shardformer] support Blip2 (#4243) by FoolPlayer
[shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit by klhhhhh
[shardformer] pre-commit check files by klhhhhh
[shardformer] register without auto policy by klhhhhh
[shardformer] ChatGLM support layernorm sharding by klhhhhh
[shardformer] delete some file by klhhhhh
[shardformer] support chatglm without layernorm by klhhhhh
[shardformer] polish code by klhhhhh
[shardformer] polish chatglm code by klhhhhh
[shardformer] add test kit in model zoo for chatglm by klhhhhh
[shardformer] vit test finish and support by klhhhhh
[shardformer] added tests by klhhhhh
Feature/chatglm (#4240) by Kun Lin
[shardformer] support whisper (#4212) by FoolPlayer
[shardformer] support SAM (#4231) by FoolPlayer
Feature/vit support (#4182) by Kun Lin
[shardformer] support pipeline base vit model (#4284) by FoolPlayer
[shardformer] support inplace sharding (#4251) by Hongxin Liu
[shardformer] fix base policy (#4229) by Hongxin Liu
[shardformer] support lazy init (#4202) by Hongxin Liu
[shardformer] fix type hint by ver217
[shardformer] rename policy file name by ver217

Legacy

[legacy] move builder and registry to legacy (#4603) by Hongxin Liu
[legacy] move engine to legacy (#4560) by Hongxin Liu
[legacy] move trainer to legacy (#4545) by Hongxin Liu

Test

[test] fix gemini checkpoint and gpt test (#4620) by Hongxin Liu
[test] ignore gpt2 shardformer test (#4619) by Hongxin Liu
[test] Hotfix/fix some model test and refactor check util api (#4369) by Bin Jia
[test] skip some not compatible models by FoolPlayer
[test] add shard util tests by ver217
[test] update shardformer tests by ver217
[test] remove useless tests (#4359) by Hongxin Liu

Zero

[zero] hotfix master param sync (#4618) by Hongxin Liu
[zero]fix zero ckptIO with offload (#4529) by LuGY
[zero]support zero2 with gradient accumulation (#4511) by LuGY

Checkpointio

[checkpointio] support huggingface from_pretrained for all plugins (#4606) by Baizhou Zhang
[checkpointio] optimize zero optim checkpoint io (#4591) by Hongxin Liu

Coati

Merge pull request #4542 from hpcaitech/chatglm by yingliu-hpc
Merge pull request #4541 from ver217/coati/chatglm by yingliu-hpc
[coati] update ci by ver217
[coati] add chatglm model (#4539) by yingliu-hpc

Doc

[doc] add llama2 benchmark (#4604) by binmakeswell
[DOC] hotfix/llama2news (#4595) by binmakeswell
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) by Tian Siyuan
[doc] update Coati README (#4405) by Wenhao Chen
[doc] add Series A Funding and NeurIPS news (#4377) by binmakeswell
[doc] Fix gradient accumulation doc. (#4349) by flybird1111

Pipeline

[pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
[pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
[pipeline] add chatglm (#4363) by Jianghai
[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
[pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
[pipeline] add unit test for 1f1b (#4303) by LuGY
[pipeline] fix return_dict/fix pure_pipeline_test (#4331) by [Baizhou Zhang](htt...

Assets 2

01 Aug 07:02

github-actions

v0.3.1

8064771

Version v0.3.1 Release Today!

What's Changed

Release

[release] update version (#4332) by Hongxin Liu

Chat

[chat] fix compute_approx_kl (#4338) by Wenhao Chen
[chat] removed cache file (#4155) by Frank Lee
[chat] use official transformers and fix some issues (#4117) by Wenhao Chen
[chat] remove naive strategy and split colossalai strategy (#4094) by Wenhao Chen
[chat] refactor trainer class (#4080) by Wenhao Chen
[chat]: fix chat evaluation possible bug (#4064) by Michelle
[chat] refactor strategy class with booster api (#3987) by Wenhao Chen
[chat] refactor actor class (#3968) by Wenhao Chen
[chat] add distributed PPO trainer (#3740) by Hongxin Liu

Zero

[zero] optimize the optimizer step time (#4221) by LuGY
[zero] support shard optimizer state dict of zero (#4194) by LuGY
[zero] add state dict for low level zero (#4179) by LuGY
[zero] allow passing process group to zero12 (#4153) by LuGY
[zero]support no_sync method for zero1 plugin (#4138) by LuGY
[zero] refactor low level zero for shard evenly (#4030) by LuGY

Nfc

[NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
[NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
[NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
[NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
[NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
[NFC] polish applications/Chat/examples/train_reward_model.py code style (#4271) by Xu Kai
[NFC] fix: format (#4270) by dayellow
[NFC] polish runtime_preparation_pass style (#4266) by Wenhao Chen
[NFC] polish unary_elementwise_generator.py code style (#4267) by YeAnbang
[NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
[NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
[NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) by 梁爽
[NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255) by Yanjia0
[NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocd_with_naming
[NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
[NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
[NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
[NFC] Fix format for mixed precision (#4253) by Jianghai
[nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
[nfc] fix dim not defined and fix typo (#3991) by digger yu
[nfc] fix typo colossalai/zero (#3923) by digger yu
[nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
[nfc] fix typo colossalai/nn (#3887) by digger yu
[nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu

Example

Fix/format (#4261) by Michelle
[example] add llama pretraining (#4257) by binmakeswell
[example] fix bucket size in example of gpt gemini (#4028) by LuGY
[example] update ViT example using booster api (#3940) by Baizhou Zhang
Merge pull request #3905 from MaruyamaAya/dreambooth by Liu Ziming
[example] update opt example using booster api (#3918) by Baizhou Zhang
[example] Modify palm example with the new booster API (#3913) by Liu Ziming
[example] update gemini examples (#3868) by jiangmingyan

Ci

[ci] support testmon core pkg change detection (#4305) by Hongxin Liu

Checkpointio

[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) by Baizhou Zhang
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) by Baizhou Zhang
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) by Baizhou Zhang
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) by Baizhou Zhang

Lazy

[lazy] support init on cuda (#4269) by Hongxin Liu
[lazy] fix compatibility problem on torch 1.13 (#3911) by Hongxin Liu
[lazy] refactor lazy init (#3891) by Hongxin Liu

Kernels

[Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li

Docker

[docker] fixed ninja build command (#4203) by Frank Lee
[docker] added ssh and rdma support for docker (#4192) by Frank Lee

Dtensor

[dtensor] fixed readme file name and removed deprecated file (#4162) by Frank Lee
[dtensor] updated api and doc (#3845) by Frank Lee

Workflow

[workflow] show test duration (#4159) by Frank Lee
[workflow] added status check for test coverage workflow (#4106) by Frank Lee
[workflow] cover all public repositories in weekly report (#4069) by Frank Lee
[workflow] fixed the directory check in build (#3980) by Frank Lee
[workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
[workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
[workflow] added docker latest tag for release (#3920) by Frank Lee
[workflow] fixed workflow check for docker build (#3849) by Frank Lee

Cli

[cli] hotfix launch command for multi-nodes (#4165) by Hongxin Liu

Format

[format] applied code formatting on changed files in pull request 4152 (#4157) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4021 (#4022) by github-actions[bot]

Shardformer

[shardformer] added development protocol for standardization (#4149) by Frank Lee
[shardformer] made tensor parallelism configurable (#4144) by Frank Lee
[shardformer] refactored some doc and api (#4137) by Frank Lee
[shardformer] write an shardformer example with bert finetuning (#4126) by jiangmingyan
[shardformer] added embedding gradient check (#4124) by Frank Lee
[shardformer] import huggingface implicitly (#4101) by Frank Lee
[shardformer] integrate with data parallelism (#4103) by Frank Lee
[shardformer] supported fused normalization (#4112) by Frank Lee
[shardformer] supported bloom model (#4098) by Frank Lee
[shardformer] support vision transformer (#4096) by Kun Lin
[shardformer] shardformer support opt models (#4091) by jiangmingyan
[shardformer] refactored layernorm (#4086) by Frank Lee
[shardformer] Add layernorm (#4072) by [FoolPlayer](https://api.github.co...

Assets 2

25 May 08:26

github-actions

v0.3.0

d42b1be

Version v0.3.0 Release Today!

What's Changed

Release

[release] bump to v0.3.0 (#3830) by Frank Lee

Nfc

[nfc] fix typo colossalai/ applications/ (#3831) by digger yu
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) by digger yu
[NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
[NFC] polish colossalai/engine/gradient_handler/init.py code style (#3329) by Ofey Chan
[NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
[NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323) by Michelle
[NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style by Xu Kai
[NFC] polish initializer_data.py code style (#3287) by RichardoLuo
[NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
[NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
[NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277) by Sze-qq
[NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
[NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) by Zirui Zhu
[NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
[NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
[NFC] polish code style (#3273) by Xuanlei Zhao
[NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
[NFC] polish code style (#3268) by Yuanchen
[NFC] polish tensor_placement_policy.py code style (#3265) by Camille Zhong
[NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
[NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
[NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) by LuGY
[NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow

Doc

[doc] update document of gemini instruction. (#3842) by jiangmingyan
Merge pull request #3810 from jiangmingyan/amp by jiangmingyan
[doc]fix by jiangmingyan
[doc]fix by jiangmingyan
[doc] add warning about fsdp plugin (#3813) by Hongxin Liu
[doc] add removed change of config.py by jiangmingyan
[doc] add removed warning by jiangmingyan
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update gradient accumulation (#3771) by jiangmingyan
[doc] update gradient cliping document (#3778) by jiangmingyan
[doc] add deprecated warning on doc Basics section (#3754) by Yanjia0
[doc] add booster docstring and fix autodoc (#3789) by Hongxin Liu
[doc] add tutorial for booster checkpoint (#3785) by Hongxin Liu
[doc] add tutorial for booster plugins (#3758) by Hongxin Liu
[doc] add tutorial for cluster utils (#3763) by Hongxin Liu
[doc] update hybrid parallelism doc (#3770) by jiangmingyan
[doc] update booster tutorials (#3718) by jiangmingyan
[doc] fix chat spelling error (#3671) by digger-yu
[Doc] enhancement on README.md for chat examples (#3646) by Camille Zhong
[doc] Fix typo under colossalai and doc(#3618) by digger-yu
[doc] .github/workflows/README.md (#3605) by digger-yu
[doc] fix setup.py typo (#3603) by digger-yu
[doc] fix op_builder/README.md (#3597) by digger-yu
[doc] Update .github/workflows/README.md (#3577) by digger-yu
[doc] Update 1D_tensor_parallel.md (#3573) by digger-yu
[doc] Update 1D_tensor_parallel.md (#3563) by digger-yu
[doc] Update README.md (#3549) by digger-yu
[doc] Update README-zh-Hans.md (#3541) by digger-yu
[doc] hide diffusion in application path (#3519) by binmakeswell
[doc] add requirement and highlight application (#3516) by binmakeswell
[doc] Add docs for clip args in zero optim (#3504) by YH
[doc] updated contributor list (#3474) by Frank Lee
[doc] polish diffusion example (#3386) by Jan Roudaut
[doc] add Intel cooperation news (#3333) by binmakeswell
[doc] added authors to the chat application (#3307) by Fazzie-Maqianli

Workflow

[workflow] supported test on CUDA 10.2 (#3841) by Frank Lee
[workflow] fixed testmon cache in build CI (#3806) by Frank Lee
[workflow] changed to doc build to be on schedule and release (#3825) by Frank Lee
[workflow] enblaed doc build from a forked repo (#3815) by Frank Lee
[workflow] enable testing for develop & feature branch (#3801) by Frank Lee
[workflow] fixed the docker build workflow (#3794) by Frank Lee

Booster

[booster] add warning for torch fsdp plugin doc (#3833) by wukong1992
[booster] torch fsdp fix ckpt (#3788) by wukong1992
[booster] removed models that don't support fsdp (#3744) by wukong1992
[booster] support torch fsdp plugin in booster (#3697) by wukong1992
[booster] add tests for ddp and low level zero's checkpointio (#3715) by jiangmingyan
[booster] fix no_sync method (#3709) by Hongxin Liu
[booster] update prepare dataloader method for plugin (#3706) by Hongxin Liu
[booster] refactor all dp fashion plugins (#3684) by Hongxin Liu
[booster] gemini plugin support shard checkpoint (#3610) by jiangmingyan
[booster] add low level zero plugin (#3594) by Hongxin Liu
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442) by Frank Lee
[booster] implement Gemini plugin (#3352) by ver217

Docs

[docs] change placememt_policy to placement_policy (#3829) by digger yu

Evaluation

[evaluation] add automatic evaluation pipeline (#3821) by Yuanchen

Docker

[Docker] Fix a couple of build issues (#3691) by Yanming W
Fix/docker action (#3266) by liuzeming

Api

[API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan

Test

[test] fixed lazy init test import error (#3799) by Frank Lee
Update test_ci.sh by Camille Zhong
[test] refactor tests with spawn (#3452) by Frank Lee
[test] reorganize zero/gem...

Assets 2

29 Mar 02:26

github-actions

v0.2.8

a0b3749

Version v0.2.8 Release Today!

What's Changed

Release

[release] v0.2.8 (#3305) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3300 (#3302) by github-actions[bot]
[format] applied code formatting on changed files in pull request 3296 (#3298) by github-actions[bot]

Doc

[doc] add ColossalChat news (#3304) by binmakeswell
[doc] add ColossalChat (#3297) by binmakeswell
[doc] fix typo (#3222) by binmakeswell
[doc] update chatgpt doc paper link (#3229) by Camille Zhong
[doc] add community contribution guide (#3153) by binmakeswell
[doc] add Intel cooperation for biomedicine (#3108) by binmakeswell

Application

[application] updated the README (#3301) by Frank Lee

Chat

[chat]polish prompts training (#3300) by BlueRum
[chat]Update Readme (#3296) by BlueRum

Coati

[coati] fix inference profanity check (#3299) by ver217
[coati] inference supports profanity check (#3295) by ver217
[coati] add repetition_penalty for inference (#3294) by ver217
[coati] fix inference output (#3285) by ver217
[Coati] first commit (#3283) by Fazzie-Maqianli

Colossalchat

[ColossalChat]add cite for datasets (#3292) by Fazzie-Maqianli

Examples

[examples] polish AutoParallel readme (#3270) by YuliangLiu0306
[examples] Solving the diffusion issue of incompatibility issue#3169 (#3170) by NatalieC323

Fx

[fx] meta registration compatibility (#3253) by HELSON
[FX] refactor experimental tracer and adapt it with hf models (#3157) by YuliangLiu0306

Booster

[booster] implemented the torch ddd + resnet example (#3232) by Frank Lee
[booster] implemented the cluster module (#3191) by Frank Lee
[booster] added the plugin base and torch ddp plugin (#3180) by Frank Lee
[booster] added the accelerator implementation (#3159) by Frank Lee
[booster] implemented mixed precision class (#3151) by Frank Lee

Ci

[CI] Fix pre-commit workflow (#3238) by Hakjin Lee

Api

[API] implement device mesh manager (#3221) by YuliangLiu0306
[api] implemented the checkpoint io module (#3205) by Frank Lee

Hotfix

[hotfix] skip torchaudio tracing test (#3211) by YuliangLiu0306
[hotfix] layout converting issue (#3188) by YuliangLiu0306

Chatgpt

[chatgpt] add precision option for colossalai (#3233) by ver217
[chatgpt] unnify datasets (#3218) by Fazzie-Maqianli
[chatgpt] support instuct training (#3216) by Fazzie-Maqianli
[chatgpt]add reward model code for deberta (#3199) by Yuanchen
[chatgpt]support llama (#3070) by Fazzie-Maqianli
[chatgpt] add supervised learning fine-tune code (#3183) by pgzhang
[chatgpt]Reward Model Training Process update (#3133) by BlueRum
[chatgpt] fix trainer generate kwargs (#3166) by ver217
[chatgpt] fix ppo training hanging problem with gemini (#3162) by ver217
[chatgpt]update ci (#3087) by BlueRum
[chatgpt]Fix examples (#3116) by BlueRum
[chatgpt] fix lora support for gpt (#3113) by BlueRum
[chatgpt] type miss of kwargs (#3107) by hiko2MSP
[chatgpt] fix lora save bug (#3099) by BlueRum

Lazyinit

[lazyinit] combine lazy tensor with dtensor (#3204) by ver217
[lazyinit] add correctness verification (#3147) by ver217
[lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217

Auto

[auto] fix requirements typo for issue #3125 (#3209) by Yan Fang

Analyzer

[Analyzer] fix analyzer tests (#3197) by YuliangLiu0306

Dreambooth

[dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323

Auto-parallel

[auto-parallel] add auto-offload feature (#3154) by Zihao

Zero

[zero] Refactor ZeroContextConfig class using dataclass (#3186) by YH

Test

[test] fixed torchrec registration in model zoo (#3177) by Frank Lee
[test] fixed torchrec model test (#3167) by Frank Lee
[test] add torchrec models to test model zoo (#3139) by YuliangLiu0306
[test] added transformers models to test model zoo (#3135) by Frank Lee
[test] added torchvision models to test model zoo (#3132) by Frank Lee
[test] added timm models to test model zoo (#3129) by Frank Lee

Refactor

[refactor] update docs (#3174) by Saurav Maheshkar

Tests

[tests] model zoo add torchaudio models (#3138) by ver217
[tests] diffuser models in model zoo (#3136) by HELSON

Docker

[docker] Add opencontainers image-spec to Dockerfile (#3006) by Saurav Maheshkar

Dtensor

[DTensor] refactor dtensor with new components (#3089) by YuliangLiu0306

Workflow

[workflow] purged extension cache before GPT test (#3128) by Frank Lee

Autochunk

[autochunk] support complete benchmark (#3121) by Xuanlei Zhao

Tutorial

[tutorial] update notes for TransformerEngine (#3098) by binmakeswell

Nvidia

[NVIDIA] Add FP8 example using TE (#3080) by Kirthi Shankar Sivamani

Full Changelog: v0.2.8...v0.2.7

Assets 2

Releases: hpcaitech/ColossalAI

Version v0.3.8 Release Today!

What's Changed

Release

Fix/example

Gemini

Chore

Bug

Bugs

Inference

Feature

Example

Colossal-inference

Nfc

Ci

Sync

Shardformer

Pre-commit.ci

Doc

Fix/inference

Lazy

Misc

Colossal-llama

Fix

Feat

Hotfix

Version v0.3.7 Release Today!

What's Changed

Release

Hotfix

News

Lazyinit

Shardformer

Fix

Coloattention

Example

Exampe

Feature

Zero

Doc

Devops

Shardformer, pipeline

Colossalchat

Format

Version v0.3.6 Release Today!

What's Changed

Release

Colossal-llama2

Hotfix

Doc

Eval-hotfix

Devops

Example

Workflow

Shardformer

Setup

Fsdp

Extension

Llama

Version v0.3.5 Release Today!

What's Changed

Release

Llama

Moe

Lr-scheduler

Eval

Gemini

Fix

Checkpointio

Chat

Extension

Doc

Tests

Accelerator

Workflow

Feat

Nfc

Hotfix

Sync

Shardformer