DeepSeek-R1 Models Download

Main Models

Model#Total Params#Activated ParamsContext LengthDownload
DeepSeek-R1-Zero671B37B128K๐Ÿค— HuggingFace
DeepSeek-R1671B37B128K๐Ÿค— HuggingFace

Note:
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For architecture details, see the DeepSeek-V3 repository.


DeepSeek-R1-Distill Models

ModelBase ModelDownload
DeepSeek-R1-Distill-Qwen-1.5BQwen2.5-Math-1.5B๐Ÿค— HuggingFace
DeepSeek-R1-Distill-Qwen-7BQwen2.5-Math-7B๐Ÿค— HuggingFace
DeepSeek-R1-Distill-Llama-8BLlama-3.1-8B๐Ÿค— HuggingFace
DeepSeek-R1-Distill-Qwen-14BQwen2.5-14B๐Ÿค— HuggingFace
DeepSeek-R1-Distill-Qwen-32BQwen2.5-32B๐Ÿค— HuggingFace
DeepSeek-R1-Distill-Llama-70BLlama-3.3-70B-Instruct๐Ÿค— HuggingFace

Implementation Notes:

  • Distilled models are fine-tuned on open-source base models using DeepSeek-R1-generated samples
  • Configuration files and tokenizers have been slightly modified
  • Important: Use our provided settings to run these models