Skip to content

Releases: hpcaitech/ColossalAI

Version v0.3.8 Release Today!

31 May 11:41
68359ed
Compare
Choose a tag to compare

What's Changed

Release

Fix/example

Gemini

  • Merge pull request #5749 from hpcaitech/prefetch by botbw
  • Merge pull request #5754 from Hz188/prefetch by botbw
  • [Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
  • [gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
  • Merge pull request #5733 from Hz188/feature/prefetch by botbw
  • Merge pull request #5731 from botbw/prefetch by botbw
  • [gemini] init auto policy prefetch by hxwang
  • Merge pull request #5722 from botbw/prefetch by botbw
  • [gemini] maxprefetch means maximum work to keep by hxwang
  • [gemini] use compute_chunk to find next chunk by hxwang
  • [gemini] prefetch chunks by hxwang
  • [gemini]remove registered gradients hooks (#5696) by flybird11111

Chore

  • [chore] refactor profiler utils by hxwang
  • [chore] remove unnecessary assert since compute list might not be recorded by hxwang
  • [chore] remove unnecessary test & changes by hxwang
  • Merge pull request #5738 from botbw/prefetch by Haze188
  • [chore] fix init error by hxwang
  • [chore] Update placement_policy.py by botbw
  • [chore] remove debugging info by hxwang
  • [chore] remove print by hxwang
  • [chore] refactor & sync by hxwang
  • [chore] sync by hxwang

Bug

Bugs

  • [bugs] fix args.profile=False DummyProfiler errro by genghaozhe

Inference

Feature

  • [Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
  • [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
  • Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
  • [Feature] qlora support (#5586) by linsj20

Example

Colossal-inference

  • [Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao

Nfc

Ci

Sync

Shardformer

  • [Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
  • [Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
  • [Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
  • Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
  • [Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
  • [shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111

Pre-commit.ci

Doc

Fix/inference

Lazy

Misc

Colossal-llama

  • [Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li

Fix

  • [Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
  • [Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
  • [Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao

Feat

Hotfix

Read more

Version v0.3.7 Release Today!

27 Apr 11:00
4cfbf30
Compare
Choose a tag to compare

What's Changed

Release

Hotfix

  • [hotfix] add soft link to support required files (#5661) by Tong Li
  • [hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
  • [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
  • [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
  • [hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
  • [hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
  • [hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

Lazyinit

Shardformer

Fix

  • [Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
  • [fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
  • [Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
  • [fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

Example

Exampe

Feature

Zero

Doc

Devops

Shardformer, pipeline

  • [shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

Format

Full Changelog: v0.3.7...v0.3.6

Version v0.3.6 Release Today!

07 Mar 15:38
8020f42
Compare
Choose a tag to compare

What's Changed

Release

Colossal-llama2

  • [colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

Doc

Eval-hotfix

Devops

Example

Workflow

Shardformer

Setup

Fsdp

  • [fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

Llama

Full Changelog: v0.3.6...v0.3.5

Version v0.3.5 Release Today!

23 Feb 08:46
adae123
Compare
Choose a tag to compare

What's Changed

Release

Llama

Moe

Lr-scheduler

Eval

Gemini

Fix

Checkpointio

  • [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

Extension

Doc

Tests

Accelerator

Workflow

Feat

Nfc

Hotfix

Sync

Shardformer

  • [shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
  • [shardformer] llama support DistCrossEntropy (#5176) by flybird11111
  • [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
  • [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
  • [shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
  • [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

Npu

Pipeline

  • [pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
  • [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
  • [pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
  • [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

Read more

Version v0.3.4 Release Today!

01 Nov 05:57
8993c8a
Compare
Choose a tag to compare

What's Changed

Release

Pipeline inference

  • [Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
  • [Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
  • [Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia

Doc

Hotfix

  • [hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
  • [hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
  • [hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
  • [hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
  • [hotfix] fix bug in sequence parallel test (#4887) by littsk
  • [hotfix] Correct several erroneous code comments (#4794) by littsk
  • [hotfix] fix norm type error in zero optimizer (#4795) by littsk
  • [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing

Kernels

  • [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li

Inference

  • [Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
  • [Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
  • [inference] add reference and fix some bugs (#4937) by Xu Kai
  • [inference] Add smmoothquant for llama (#4904) by Xu Kai
  • [inference] add llama2 support (#4898) by Xu Kai
  • [inference]fix import bug and delete down useless init (#4830) by Jianghai

Test

  • [test] merge old components to test to model zoo (#4945) by Hongxin Liu
  • [test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
  • Merge pull request #4856 from KKZ20/test/model_support_for_low_level_zero by ppt0011
  • [test] modify model supporting part of low_level_zero plugin (including correspoding docs) by Zhongkai Zhao

Refactor

  • [Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li

Nfc

Format

Gemini

Kernel

  • [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu

Feature

  • [feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
  • [feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) by littsk
  • [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen

Checkpointio

Infer

Chat

Misc

  • [misc] add last_epoch in CosineAnnealingWarmupLR (#4778) by Yan haixu

Lazy

Fix

Full Changelog: v0.3.4...v0.3.3

Version v0.3.3 Release Today!

22 Sep 10:30
4146f1c
Compare
Choose a tag to compare

What's Changed

Release

Inference

Feature

  • [feature] add gptq for inference (#4754) by Xu Kai
  • [Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li

Bug

  • [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
  • [bug] fix get_default_parser in examples (#4764) by Baizhou Zhang

Lazy

Chat

Doc

Shardformer

Misc

Format

Legacy

Kernel

Example

Hotfix

Devops

Pipeline

Full Changelog: v0.3.3...v0.3.2

Version v0.3.2 Release Today!

06 Sep 15:42
9709b8f
Compare
Choose a tag to compare

What's Changed

Release

Shardformer

Legacy

Test

Zero

  • [zero] hotfix master param sync (#4618) by Hongxin Liu
  • [zero]fix zero ckptIO with offload (#4529) by LuGY
  • [zero]support zero2 with gradient accumulation (#4511) by LuGY

Checkpointio

Coati

Doc

Pipeline

  • [pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
  • [pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
  • [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
  • [pipeline] add chatglm (#4363) by Jianghai
  • [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
  • [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
  • [pipeline] add unit test for 1f1b (#4303) by LuGY
  • [pipeline] fix return_dict/fix pure_pipeline_test (#4331) by [Baizhou Zhang](htt...
Read more

Version v0.3.1 Release Today!

01 Aug 07:02
8064771
Compare
Choose a tag to compare

What's Changed

Release

Chat

Zero

  • [zero] optimize the optimizer step time (#4221) by LuGY
  • [zero] support shard optimizer state dict of zero (#4194) by LuGY
  • [zero] add state dict for low level zero (#4179) by LuGY
  • [zero] allow passing process group to zero12 (#4153) by LuGY
  • [zero]support no_sync method for zero1 plugin (#4138) by LuGY
  • [zero] refactor low level zero for shard evenly (#4030) by LuGY

Nfc

  • [NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
  • [NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
  • [NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
  • [NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
  • [NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
  • [NFC] polish applications/Chat/examples/train_reward_model.py code style (#4271) by Xu Kai
  • [NFC] fix: format (#4270) by dayellow
  • [NFC] polish runtime_preparation_pass style (#4266) by Wenhao Chen
  • [NFC] polish unary_elementwise_generator.py code style (#4267) by YeAnbang
  • [NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
  • [NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
  • [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) by 梁爽
  • [NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255) by Yanjia0
  • [NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocd_with_naming
  • [NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
  • [NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
  • [NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
  • [NFC] Fix format for mixed precision (#4253) by Jianghai
  • [nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
  • [nfc] fix dim not defined and fix typo (#3991) by digger yu
  • [nfc] fix typo colossalai/zero (#3923) by digger yu
  • [nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
  • [nfc] fix typo colossalai/nn (#3887) by digger yu
  • [nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu

Example

Ci

Checkpointio

Lazy

Kernels

  • [Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li

Docker

Dtensor

Workflow

Cli

Format

Shardformer

Read more

Version v0.3.0 Release Today!

25 May 08:26
d42b1be
Compare
Choose a tag to compare

What's Changed

Release

Nfc

  • [nfc] fix typo colossalai/ applications/ (#3831) by digger yu
  • [NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
  • [NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
  • [NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) by digger yu
  • [NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
  • [NFC] polish colossalai/engine/gradient_handler/init.py code style (#3329) by Ofey Chan
  • [NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
  • [NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323) by Michelle
  • [NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style by Xu Kai
  • [NFC] polish initializer_data.py code style (#3287) by RichardoLuo
  • [NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
  • [NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
  • [NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277) by Sze-qq
  • [NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
  • [NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) by Zirui Zhu
  • [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
  • [NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
  • [NFC] polish code style (#3273) by Xuanlei Zhao
  • [NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
  • [NFC] polish code style (#3268) by Yuanchen
  • [NFC] polish tensor_placement_policy.py code style (#3265) by Camille Zhong
  • [NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
  • [NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
  • [NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) by LuGY
  • [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow

Doc

Workflow

Booster

Docs

  • [docs] change placememt_policy to placement_policy (#3829) by digger yu

Evaluation

  • [evaluation] add automatic evaluation pipeline (#3821) by Yuanchen

Docker

Api

  • [API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan

Test

Read more

Version v0.2.8 Release Today!

29 Mar 02:26
a0b3749
Compare
Choose a tag to compare

What's Changed

Release

Format

Doc

Application

Chat

Coati

Colossalchat

Examples

Fx

Booster

Ci

Api

Hotfix

Chatgpt

Lazyinit

  • [lazyinit] combine lazy tensor with dtensor (#3204) by ver217
  • [lazyinit] add correctness verification (#3147) by ver217
  • [lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217

Auto

Analyzer

Dreambooth

  • [dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323

Auto-parallel

  • [auto-parallel] add auto-offload feature (#3154) by Zihao

Zero

  • [zero] Refactor ZeroContextConfig class using dataclass (#3186) by YH

Test

Refactor

Tests

  • [tests] model zoo add torchaudio models (#3138) by ver217
  • [tests] diffuser models in model zoo (#3136) by HELSON

Docker

Dtensor

Workflow

  • [workflow] purged extension cache before GPT test (#3128) by Frank Lee

Autochunk

Tutorial

Nvidia

Full Changelog: v0.2.8...v0.2.7