GPT 算子及其优化调研

参考

[Done]Jay Mody. GPT in 60 Lines of NumPy. https://jaykmody.com/blog/gpt-from-scratch/. 原文代码 https://github.com/jaymody/picoGPT , 中文翻译版 https://jiqihumanr.github.io/2023/04/13/gpt-from-scratch/ . 2023.
[Done]Jay Alammar. The Illustrated GPT-2 (Visualizing Transformer Language Models). http://jalammar.github.io/illustrated-gpt2/
[Done]Jay Alammar. The Illustrated Transformer. http://jalammar.github.io/illustrated-transformer/ . 2020
[Done]Andrej Karpathy. Let's build GPT. https://www.youtube.com/watch?v=kCc8FmEb1nY . 2023 . Code https://github.com/karpathy/nanoGPT .
[Done]GPT/Transformer 中的一些算法的使用原因. https://zhuanlan.zhihu.com/p/559495068 .
CodeGeeX, Figure 2 是一个很清楚的 GPT 模型结构图（注意相比标准结构多了 Top Query Layer）, https://arxiv.org/pdf/2303.17568
LLM Visualization, https://bbycroft.net/llm
Transformer 内存、计算量定量估计.
1. 分析transformer模型的参数量、计算量、中间激活、KV cache. https://zhuanlan.zhihu.com/p/624740065
2. Transformer Math 101. https://blog.eleuther.ai/transformer-math/
3. Transformer Inference Arithmetic. https://kipp.ly/transformer-inference-arithmetic/