DeepSeek-R1-Distill-Qwen-1.5B
Introduction
DeepSeek-R1-Distill-Qwen-1.5B is fine-tuned based on open-source models, using samples generated by DeepSeek-R1. with 0.5 billion parameters. Key highlights of this model include:
Type: Causal Language Model
Training Stage: Pretraining & Post-training
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
Number of Parameters: 1.54B (1.31B non-embedding)
Number of Layers: 28
Number of Attention Heads (GQA): 12 for Q and 2 for KV
Context Length: Full 131,072 tokens and generation up to 8,192 tokens
Available NPU Models
Base Model
deepseek-r1-1.5B-ax630c
The Base Model providing a 128 context window and a maximum output of 1,024 tokens.
Support Platforms: LLM630 Compute Kit, Module LLM, and Module LLM Kit
128 context window
1,024 max output tokens
ttft 1075.04ms
avg-token/s 3.57
Install
apt install llm-model-deepseek-r1-1.5b-ax630c
Manual installation: Click here to download llm-model-deepseek-r1-1.5b-ax630c
Long-Context Model
deepseek-r1-1.5B-p256-ax630c
The Long-Context Model Compared to the Base Model, it provides extended context capabilities, offering a 256 context window and a maximum of 1,024 output tokens.
Support Platforms: LLM630 Compute Kit, Module LLM, Module LLM Kit
256 context window
1,024 max output tokens
ttft 3056.86ms
avg-token/s 3.57
Install
apt install llm-model-deepseek-r1-1.5b-p256-ax630c
Manual installation: Click here to download llm-model-deepseek-r1-1.5b-p256-ax630c