DeepSeek-R1-Distill-Qwen-1.5B 

Introduction

DeepSeek-R1-Distill-Qwen-1.5B is fine-tuned based on open-source models, using samples generated by DeepSeek-R1. with 0.5 billion parameters. Key highlights of this model include:

Type: Causal Language Model
Training Stage: Pretraining & Post-training
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
Number of Parameters: 1.54B (1.31B non-embedding)
Number of Layers: 28
Number of Attention Heads (GQA): 12 for Q and 2 for KV
Context Length: Full 131,072 tokens and generation up to 8,192 tokens

Available NPU Models

Base Model

deepseek-r1-1.5B-ax630c

The Base Model providing a 128 context window and a maximum output of 1,024 tokens.

Support Platforms: LLM630 Compute Kit, Module LLM, and Module LLM Kit

128 context window
1,024 max output tokens
ttft 1075.04ms
avg-token/s 3.57

Install

apt install llm-model-deepseek-r1-1.5b-ax630c

Manual installation: Click here to download llm-model-deepseek-r1-1.5b-ax630c

Long-Context Model

deepseek-r1-1.5B-p256-ax630c

The Long-Context Model Compared to the Base Model, it provides extended context capabilities, offering a 256 context window and a maximum of 1,024 output tokens.

Support Platforms: LLM630 Compute Kit, Module LLM, Module LLM Kit

256 context window
1,024 max output tokens
ttft 3056.86ms
avg-token/s 3.57

Install

apt install llm-model-deepseek-r1-1.5b-p256-ax630c

Manual installation: Click here to download llm-model-deepseek-r1-1.5b-p256-ax630c

DeepSeek-R1-Distill-Qwen-1.5B

Introduction

Available NPU Models

Base Model

deepseek-r1-1.5B-ax630c

Install

Long-Context Model

deepseek-r1-1.5B-p256-ax630c

Install

DeepSeek-R1-Distill-Qwen-1.5B 