Large Language Model
November 22, 2023Less than 1 minute
Large Language Model
Pre-train 预训练范式
- ARM (Auto-Regressive Model)
- MDM (Masked Diffusion Model)
- AE (Auto-Encoding)
大规模数据 + Transformer + 自监督学习 + 自回归/自编码目标 + 泛化能力优先
- 预训练(Pre-training)
- 后训练/对齐(post-training/Alignment)
- 监督/指令微调(Supervised/Instruction Fine-Tuning)
- 人类偏好对齐/强化学习 (Alignment)
- RLHF
- DPO
- PPO (Proximal Policy Optimization)
- GRPO (Group Relative Policy Optimization)
- RLAIF
- Reward Model
- 对齐与安全机制(Alignment & Safety Tuning)