参考

  1. [Done]Jay Mody. GPT in 60 Lines of NumPy. https://jaykmody.com/blog/gpt-from-scratch/. 原文代码 https://github.com/jaymody/picoGPT , 中文翻译版 https://jiqihumanr.github.io/2023/04/13/gpt-from-scratch/ . 2023.
  2. [Done]Jay Alammar. The Illustrated GPT-2 (Visualizing Transformer Language Models). http://jalammar.github.io/illustrated-gpt2/
  3. [Done]Jay Alammar. The Illustrated Transformer. http://jalammar.github.io/illustrated-transformer/ . 2020
  4. [Done]Andrej Karpathy. Let's build GPT. https://www.youtube.com/watch?v=kCc8FmEb1nY . 2023 . Code https://github.com/karpathy/nanoGPT .
  5. [Done]GPT/Transformer 中的一些算法的使用原因. https://zhuanlan.zhihu.com/p/559495068 .
  6. CodeGeeX, Figure 2 是一个很清楚的 GPT 模型结构图(注意相比标准结构多了 Top Query Layer), https://arxiv.org/pdf/2303.17568
  7. LLM Visualization, https://bbycroft.net/llm
  8. Transformer 内存、计算量定量估计.
    1. 分析transformer模型的参数量、计算量、中间激活、KV cache. https://zhuanlan.zhihu.com/p/624740065
    2. Transformer Math 101. https://blog.eleuther.ai/transformer-math/
    3. Transformer Inference Arithmetic. https://kipp.ly/transformer-inference-arithmetic/

进一步参考

  1. The Annotated Transformer. http://nlp.seas.harvard.edu/annotated-transformer/
  2. Transformers Explained Visually (Part 3): Multi-head Attention, deep dive. https://towardsdatascience.com/transformers-explained-visually-part-3-multi-head-attention-deep-dive-1c1ff1024853 .
  3. Why multi-head self attention works. ****https://theaisummer.com/self-attention/ .
  4. 梁德澎. RoPE 一文看懂 LLaMA 中的旋转式位置编码**.** https://mp.weixin.qq.com/s/0peSNWN0ypMopPR0Q_pujQ
  5. 大模型推理:从模型分析到计算优化(一). https://mp.weixin.qq.com/s/VaRvrtcNRLzDntE6fPJSIw
  6. 大模型推理:从模型分析到计算优化(二). https://mp.weixin.qq.com/s/tlGtr1fOTFElTuGHKyHKgQ
  7. vllm. https://github.com/vllm-project/vllm
  8. 4-bit Quantization with GPTQ. ****
    1. https://towardsdatascience.com/4-bit-quantization-with-gptq-36b0f4f02c34
    2. GPTQ paper, ****https://arxiv.org/abs/2210.17323. paper, https://github.com/IST-DASLab/gptq
  9. Large Transformer Model Inference Optimization. https://lilianweng.github.io/posts/2023-01-10-inference-optimization/