调研作者 strint，和 BBuff, Chengpeng, Juncheng, Wenxiao017? 讨论得到。

本文在 strint.github.io 继续完善，目的是跟进Megatron-LM相关进展。578~

目录

参考资料

Megatron-LM1 之模型并行（调研的模型并行部分参考的这篇Paper）. https://arxiv.org/abs/1909.08053 . 2020.
- Megatron-LM GTC 2020.
  
  s21496-megatron-lm-training-multi-billion-parameter-language-models-using-model-parallelism.pdf
  - video https://developer.nvidia.com/gtc/2020/video/s21496
- LiMu. Megatrom-LM paper reading. https://www.youtube.com/watch?v=mk8FuiDmf0I .
Megatron-LM2 之流水并行和3D并行（调研的模型并行和混合并行部分参考的这篇Paper）. https://arxiv.org/abs/2104.04473 . 2021.
Megatron-LM 测评环境 Selene Supercomputer 解析

s31700-janneblomqvist-understanding-selene_1616715331011001YQKN.pdf

Accelerating-AI-at-Scale_Julie_Prethvi_updated051421.pdf
DGX A100 配置（超赞的介绍）. https://www.microway.com/hpc-tech-tips/dgx-a100-review-throughput-and-hardware-summary/ .
集合通信通信量分析.
- https://zhuanlan.zhihu.com/p/504957661
- https://www.cs.utexas.edu/~pingali/CSE392/2011sp/lectures/Conc_Comp.pdf

进一步参考

NeMo（传言是要取代 Megatron-LM 的项目）. https://github.com/NVIDIA/NeMo
Megatron-LM 技术博客.
- Scaling Language Model Training to a Trillion Parameters Using Megatron
Bloom Megatron 训练日志. https://www.cnblogs.com/Matrix_Yao/p/17238627.html .
Megarton-LM 性能分析.

Dive Deep into the Performance Model of GPT-3 Training on Megatron-LM_ Storage, Computation, and Communication.pdf
Megatron-Turing NLG 530B. https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B. https://arxiv.org/abs/2201.11990 . 2022.
Reducing Activation Recomputation in Large Transformer Models. https://arxiv.org/abs/2205.05198 . 2022.
FriendliAI. Accelerating LLM Training with Memory-Balanced Pipeline Parallelism. https://medium.com/friendliai/accelerating-llm-training-with-memory-balanced-pipeline-parallelism-3fdfb6ec2c80 . 202307

概要

之前主要的大模型训练方式是数据并行，Megatron-LM 比较成熟的支持 LLM 的模型并行和流水并行。