{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":379078148,"defaultBranch":"main","name":"Megatron-DeepSpeed","ownerLogin":"microsoft","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2021-06-21T22:26:38.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/6154722?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1716237457.0","currentOid":""},"activityList":{"items":[{"before":"a1cec0b854bf3809f655271041d620e4040cb06c","after":"04fb1b46db076ca36fff4bdda38fcf1cec7ff27b","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-22T03:00:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Add DIR to llama scripts","shortMessageHtmlLink":"Add DIR to llama scripts"}},{"before":"fc04643b359903489eb7ef376e5493b614d84ee5","after":"a1cec0b854bf3809f655271041d620e4040cb06c","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-22T02:56:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Rework llama scripts","shortMessageHtmlLink":"Rework llama scripts"}},{"before":"df4ecc2b7c729346430d53792d2117a256fad4e6","after":"fc04643b359903489eb7ef376e5493b614d84ee5","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-22T02:25:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Add llama analysis script","shortMessageHtmlLink":"Add llama analysis script"}},{"before":"ffb9d5915cff2447f306a72d6b42c8f6c987d590","after":"df4ecc2b7c729346430d53792d2117a256fad4e6","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-22T02:06:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Add Llama universal script","shortMessageHtmlLink":"Add Llama universal script"}},{"before":"af5ceab97388120baeb01791158e79f50a05835a","after":"ffb9d5915cff2447f306a72d6b42c8f6c987d590","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-22T02:01:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Llama to bf16","shortMessageHtmlLink":"Llama to bf16"}},{"before":"0e3e0d26e9a156654b38075ae159373937a59142","after":"af5ceab97388120baeb01791158e79f50a05835a","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-22T01:09:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Llama UCP scripts:","shortMessageHtmlLink":"Llama UCP scripts:"}},{"before":"a77935c30ffd0bc2b88600c1bdbd115f7683dcb2","after":"0e3e0d26e9a156654b38075ae159373937a59142","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-21T22:00:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Update Llama UCP script","shortMessageHtmlLink":"Update Llama UCP script"}},{"before":"b6a68bf8fbebc0f3305b6fa18e93c65457b1ca4d","after":"a77935c30ffd0bc2b88600c1bdbd115f7683dcb2","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-20T21:47:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Add llama UCP command","shortMessageHtmlLink":"Add llama UCP command"}},{"before":null,"after":"b6a68bf8fbebc0f3305b6fa18e93c65457b1ca4d","ref":"refs/heads/lekurile/uc_bench","pushedAt":"2024-05-20T20:37:37.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Create plots from .csv","shortMessageHtmlLink":"Create plots from .csv"}},{"before":"bcedecd1ff788d4d363f3365fd396053a08d65be","after":"7eb36a11b3a9c48ed07b93692ccf22bfb5577f7e","ref":"refs/heads/main","pushedAt":"2024-05-13T09:55:56.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tjruwase","name":"Olatunji Ruwase","path":"/tjruwase","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4271600?s=80&v=4"},"commit":{"message":"get distributed backend name via accelerator and check loss_scale before writing to tb (#374)\n\n* check loss_scale before writing to tb\r\n\r\n* get distributed backend name via accelerator\r\n\r\n* add hccl distributed backend support","shortMessageHtmlLink":"get distributed backend name via accelerator and check loss_scale bef…"}},{"before":"3c5f47563f697702c1e305fa01b7563f54b747fc","after":"bcedecd1ff788d4d363f3365fd396053a08d65be","ref":"refs/heads/main","pushedAt":"2024-04-09T08:22:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tohtana","name":"Masahiro Tanaka","path":"/tohtana","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/81312776?s=80&v=4"},"commit":{"message":"Support MoE for GPTModelPipe (#373)\n\n* MOE: Support MoE layers creation for GPTModelPipe\r\n\r\nSigned-off-by: Moshe Island \r\n\r\n* MOE: Support MoE aux loss for GPTModelPipe\r\n\r\nPropagate aux loss along GPTModelPipe layers by forwarding the aggregated loss\r\nfrom each transformer layer to the next transformer layer.\r\n\r\nIn addition, add a layer to GPTModelPipe, after the last transformer layer, to\r\ncatch the final aggregated aux loss and cache it for use in the loss function.\r\n\r\nSigned-off-by: Moshe Island \r\n\r\n* MOE: Support display of MoE loss for GPTModelPipe\r\n\r\nSigned-off-by: Moshe Island \r\n\r\n* MOE: Verify MoE with no pipe/grad partitioned\r\n\r\nCurrently PipelineEngine supports only a single tensor partitioning with grad.\r\nMoE model requires to forward with grad both the activations and the aux_loss.\r\nTherefore, until PilelineEngine limitation is removed, verify no partitioning\r\nwhen using MoE.\r\n\r\nSigned-off-by: Moshe Island \r\n\r\n---------\r\n\r\nSigned-off-by: Moshe Island \r\nCo-authored-by: Moshe Island ","shortMessageHtmlLink":"Support MoE for GPTModelPipe (#373)"}},{"before":"888a63ad8e06d0e84b0d5904a310fa4218f23d9c","after":"3c5f47563f697702c1e305fa01b7563f54b747fc","ref":"refs/heads/main","pushedAt":"2024-04-02T17:50:47.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tjruwase","name":"Olatunji Ruwase","path":"/tjruwase","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4271600?s=80&v=4"},"commit":{"message":"remove contiguous copy for flash-attn opbuilder (#372)\n\n* remove unnecessary codes for latest flash-attn opbuilder\r\n\r\n* add use-flash-attn-builder to make flash_attn usage clear and compatible\r\n\r\n* use hasattr","shortMessageHtmlLink":"remove contiguous copy for flash-attn opbuilder (#372)"}},{"before":"ebe80252f492613fe60489224669a5a8f370dbd3","after":"888a63ad8e06d0e84b0d5904a310fa4218f23d9c","ref":"refs/heads/main","pushedAt":"2024-04-02T15:13:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"samadejacobs","name":"Sam Ade Jacobs","path":"/samadejacobs","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16696152?s=80&v=4"},"commit":{"message":"fix an issue for DP on Megatron-DeepSpeed (#368)","shortMessageHtmlLink":"fix an issue for DP on Megatron-DeepSpeed (#368)"}},{"before":"df0e2e42adbe021c4d472e27cce42a2775f4b7e9","after":"ebe80252f492613fe60489224669a5a8f370dbd3","ref":"refs/heads/main","pushedAt":"2024-03-10T17:07:41.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tjruwase","name":"Olatunji Ruwase","path":"/tjruwase","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4271600?s=80&v=4"},"commit":{"message":"MOE: Support disable top2 2nd expert sampling (#362)\n\nDeepSpeed's MoE top2 gating performs sampling to select 2nd expert.\r\nSupport configuration for disabling of sampling (i.e. using argmax).\r\nNew argument: --disable-moe-top2-2nd-expert-sampling.\r\n\r\nSigned-off-by: Moshe Island \r\nCo-authored-by: Moshe Island ","shortMessageHtmlLink":"MOE: Support disable top2 2nd expert sampling (#362)"}},{"before":"a9856ce0e75dbe69c96d4e241e8a191b344118d7","after":"df0e2e42adbe021c4d472e27cce42a2775f4b7e9","ref":"refs/heads/main","pushedAt":"2024-03-10T17:02:22.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tjruwase","name":"Olatunji Ruwase","path":"/tjruwase","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4271600?s=80&v=4"},"commit":{"message":"Support universal checkpoint for GPTModel (#361)\n\nSave to checkpoints the required universal patterns for GPTModel.\r\n\r\nAdditionally, unify the logic of universal checkpoint info for both GPTModel\r\nand GPTModelPipe under a new class: UniversalCheckpointInfo.\r\n\r\nSigned-off-by: Moshe Island \r\nCo-authored-by: Moshe Island ","shortMessageHtmlLink":"Support universal checkpoint for GPTModel (#361)"}},{"before":"31e2584cff349a3b0871f8ea26b1e2e04972706a","after":"a9856ce0e75dbe69c96d4e241e8a191b344118d7","ref":"refs/heads/main","pushedAt":"2024-02-27T17:05:09.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"conglongli","name":"Conglong Li","path":"/conglongli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4238238?s=80&v=4"},"commit":{"message":"Update pretrain_bert.py (#355)\n\nfix a bug that cause\r\nFile \"pretrain_bert.py\", line 91, in loss_func\r\n lm_loss_, sop_logits = output_tensor\r\n ^^^^^^^^^^^^^^^^^^^^\r\nValueError: too many values to unpack (expected 2)","shortMessageHtmlLink":"Update pretrain_bert.py (#355)"}},{"before":"81d68a32bd5f3ddee6ebc9f81819343b0621054b","after":"31e2584cff349a3b0871f8ea26b1e2e04972706a","ref":"refs/heads/main","pushedAt":"2024-02-26T14:36:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tjruwase","name":"Olatunji Ruwase","path":"/tjruwase","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4271600?s=80&v=4"},"commit":{"message":"Support loading checkpoint specific tag (#352)\n\nSigned-off-by: Moshe Island \r\nCo-authored-by: Moshe Island ","shortMessageHtmlLink":"Support loading checkpoint specific tag (#352)"}},{"before":"9ba0dcb7fe2c2338a611b972de44bc116eda416c","after":"81d68a32bd5f3ddee6ebc9f81819343b0621054b","ref":"refs/heads/main","pushedAt":"2024-02-25T23:28:30.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"conglongli","name":"Conglong Li","path":"/conglongli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4238238?s=80&v=4"},"commit":{"message":"Support configuration of RoPE theta (#351)\n\nRequired for models that use theta != 10000 (default).\r\nFor example, Mixtral model uses theta=1000000.\r\n\r\nSigned-off-by: Moshe Island \r\nCo-authored-by: Moshe Island ","shortMessageHtmlLink":"Support configuration of RoPE theta (#351)"}},{"before":"c934137f47148e364ad2794adf91c580597c003a","after":"9ba0dcb7fe2c2338a611b972de44bc116eda416c","ref":"refs/heads/main","pushedAt":"2024-02-22T23:48:24.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Add steps for running TensorBoard analysis in Universal Checkpointing README (#349)\n\nThis PR updates the Universal Checkpointing example README to include instructions for running the TensorBoard analysis and generating the csv and png files.","shortMessageHtmlLink":"Add steps for running TensorBoard analysis in Universal Checkpointing…"}},{"before":"46cad593a60fd7ccdf22b3961659386db01f5277","after":"75de408070bc954603b9749ac5b32e8e8455b390","ref":"refs/heads/lekurile/add_analysis_readme","pushedAt":"2024-02-22T18:25:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Update wording in README","shortMessageHtmlLink":"Update wording in README"}},{"before":"4badda5966ca844051791d61be452d1ed7c0623d","after":"46cad593a60fd7ccdf22b3961659386db01f5277","ref":"refs/heads/lekurile/add_analysis_readme","pushedAt":"2024-02-22T00:24:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Consistent naming","shortMessageHtmlLink":"Consistent naming"}},{"before":"49e11300ba43d0fb4238e130930b4f92b1207d8a","after":"4badda5966ca844051791d61be452d1ed7c0623d","ref":"refs/heads/lekurile/add_analysis_readme","pushedAt":"2024-02-22T00:23:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Update figure number","shortMessageHtmlLink":"Update figure number"}},{"before":"585c3dc251b8657692cd9daf399001a556d6ac34","after":"49e11300ba43d0fb4238e130930b4f92b1207d8a","ref":"refs/heads/lekurile/add_analysis_readme","pushedAt":"2024-02-22T00:22:04.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Fix formatting","shortMessageHtmlLink":"Fix formatting"}},{"before":null,"after":"585c3dc251b8657692cd9daf399001a556d6ac34","ref":"refs/heads/lekurile/add_analysis_readme","pushedAt":"2024-02-22T00:16:42.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Add steps for running TensorBoard analysis in Universal Checkpointing README","shortMessageHtmlLink":"Add steps for running TensorBoard analysis in Universal Checkpointing…"}},{"before":"ea82c1430faa678f7da3a0c125d73ccae7b1ae6a","after":"c934137f47148e364ad2794adf91c580597c003a","ref":"refs/heads/main","pushedAt":"2024-02-21T23:08:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"conglongli","name":"Conglong Li","path":"/conglongli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4238238?s=80&v=4"},"commit":{"message":"Track additional metrics with W&B in `megatron/training.py` (#348)\n\n* Track additional metrics with W&B in `megatron/training.py`\r\n\r\n* Update `megatron/training.py`","shortMessageHtmlLink":"Track additional metrics with W&B in megatron/training.py (#348)"}},{"before":"3a309130b941a571a7b5206ed3caf1b909f9a353","after":"ea82c1430faa678f7da3a0c125d73ccae7b1ae6a","ref":"refs/heads/main","pushedAt":"2024-02-21T21:33:47.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Update Megatron type check (#346)\n\nThis PR updates the Megatron type check to check against the accelerator specific dtype instead of the class. The change is necessary to account for warning fixes in microsoft/DeepSpeed#5018.","shortMessageHtmlLink":"Update Megatron type check (#346)"}},{"before":"35579447e56a60002d77111a4202bb22fcfa16db","after":"3a309130b941a571a7b5206ed3caf1b909f9a353","ref":"refs/heads/main","pushedAt":"2024-02-21T05:15:42.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"conglongli","name":"Conglong Li","path":"/conglongli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4238238?s=80&v=4"},"commit":{"message":"Remove duplicate ctx save backward in cross_entropy.py (#347)","shortMessageHtmlLink":"Remove duplicate ctx save backward in cross_entropy.py (#347)"}},{"before":"d47f3cda3a9316ddc68e7f0ef904d1650ba6419d","after":"35579447e56a60002d77111a4202bb22fcfa16db","ref":"refs/heads/main","pushedAt":"2024-02-21T02:19:15.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tjruwase","name":"Olatunji Ruwase","path":"/tjruwase","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4271600?s=80&v=4"},"commit":{"message":"Add TensorBoard analysis script to Universal Checkpointing Example (#345)\n\n* Clean up UC scripts and update UC README\r\n\r\n* Revert LOAD_TP change\r\n\r\n* Update parallelism degrees\r\n\r\n* UC Matplotlib generation script\r\n\r\n* Add matplotlib code\r\n\r\n* Script rename\r\n\r\n* Source label names using regex\r\n\r\n* Update plot gen script\r\n\r\n* Revert 3D parallelism change\r\n\r\n* regex matches to py variables\r\n\r\n* Move location of script\r\n\r\n* Update regex to search for multi-digit parallelism degrees\r\n\r\n* Create ABC class for analyzer and remove UC specific analysis elements\r\n\r\n* Move args to separate folder, add sns switch\r\n\r\n* add bash script for UC analysis\r\n\r\n* Change name of script\r\n\r\n* Move UC specific label name to class\r\n\r\n* Rename script\r\n\r\n* clean up script\r\n\r\n* Update analyzer return\r\n\r\n* Update bash script\r\n\r\n* remove log_dir\r\n\r\n* Address PR comments","shortMessageHtmlLink":"Add TensorBoard analysis script to Universal Checkpointing Example (#345"}},{"before":null,"after":"4d19834af9788e939d33e79e8086ee9c327bb0ce","ref":"refs/heads/lekurile/update_type_check","pushedAt":"2024-02-21T00:12:58.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Update Megatron type check","shortMessageHtmlLink":"Update Megatron type check"}},{"before":"1f49ee45998df8491d5f864212e12208d21382bb","after":"f66fc37537386fc670093f8e7a93d66b2e08b1f0","ref":"refs/heads/lekurile/uc_plot_script","pushedAt":"2024-02-20T19:36:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"lekurile","name":"Lev Kurilenko","path":"/lekurile","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113481193?s=80&v=4"},"commit":{"message":"Address PR comments","shortMessageHtmlLink":"Address PR comments"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEUHaRWAA","startCursor":null,"endCursor":null}},"title":"Activity · microsoft/Megatron-DeepSpeed"}