DeepSeek

The team first trains DeepSeek-Zero

DeepSeek-Zero leads to very good reasonings, but poor language fluency/quality, which is expected given that it has not been trained on real language data, but only on its own generations.

DeepSeek-R1 thus follows the following training process:

Finally, DeepSeek-R1 is used as a teacher to distill its reasoning capabilities into Qwen2.5 and Llama3.1.