调研作者 strint,和 BBuff, Chengpeng, Juncheng, Wenxiao 讨论得到。
本文在 strint.github.io 继续完善,目的是跟进Megatron-LM相关进展。
目录
Megatron-LM1 之模型并行(调研的模型并行部分参考的这篇Paper). https://arxiv.org/abs/1909.08053 . 2020.
Megatron-LM GTC 2020.
s21496-megatron-lm-training-multi-billion-parameter-language-models-using-model-parallelism.pdf
LiMu. Megatrom-LM paper reading. https://www.youtube.com/watch?v=mk8FuiDmf0I .
Megatron-LM2 之流水并行和3D并行(调研的模型并行和混合并行部分参考的这篇Paper). https://arxiv.org/abs/2104.04473 . 2021.
Megatron-LM 测评环境 Selene Supercomputer 解析
s31700-janneblomqvist-understanding-selene_1616715331011001YQKN.pdf
DGX A100 配置(超赞的介绍). https://www.microway.com/hpc-tech-tips/dgx-a100-review-throughput-and-hardware-summary/ .
集合通信通信量分析.
NeMo(传言是要取代 Megatron-LM 的项目). https://github.com/NVIDIA/NeMo
Megatron-LM 技术博客.
Bloom Megatron 训练日志. https://www.cnblogs.com/Matrix_Yao/p/17238627.html .
Megarton-LM 性能分析.
Megatron-Turing NLG 530B. https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
Reducing Activation Recomputation in Large Transformer Models. https://arxiv.org/abs/2205.05198 . 2022.
FriendliAI. Accelerating LLM Training with Memory-Balanced Pipeline Parallelism. https://medium.com/friendliai/accelerating-llm-training-with-memory-balanced-pipeline-parallelism-3fdfb6ec2c80 . 202307
之前主要的大模型训练方式是数据并行,Megatron-LM 比较成熟的支持 LLM 的模型并行和流水并行。