DeepSeek-R1 Models Download
Main Models
Model | #Total Params | #Activated Params | Context Length | Download |
---|---|---|---|---|
DeepSeek-R1-Zero | 671B | 37B | 128K | ๐ค HuggingFace |
DeepSeek-R1 | 671B | 37B | 128K | ๐ค HuggingFace |
Note:
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For architecture details, see the DeepSeek-V3 repository.
DeepSeek-R1-Distill Models
Model | Base Model | Download |
---|---|---|
DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | ๐ค HuggingFace |
DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | ๐ค HuggingFace |
DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | ๐ค HuggingFace |
DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | ๐ค HuggingFace |
DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | ๐ค HuggingFace |
DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | ๐ค HuggingFace |
Implementation Notes:
- Distilled models are fine-tuned on open-source base models using DeepSeek-R1-generated samples
- Configuration files and tokenizers have been slightly modified
- Important: Use our provided settings to run these models