`DeepSeek-R1-Distill-Qwen-1.5B `_ =================================================================================================== Introduction ------------ DeepSeek-R1-Distill-Qwen-1.5B is fine-tuned based on open-source models, using samples generated by DeepSeek-R1. with 0.5 billion parameters. Key highlights of this model include: - **Type**: Causal Language Model - **Training Stage**: Pretraining & Post-training - **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings - **Number of Parameters**: 1.54B (1.31B non-embedding) - **Number of Layers**: 28 - **Number of Attention Heads (GQA)**: 12 for Q and 2 for KV - **Context Length**: Full 131,072 tokens and generation up to 8,192 tokens Available NPU Models -------------------- Base Model ~~~~~~~~~~ deepseek-r1-1.5B-ax630c ^^^^^^^^^^^^^^^^^^^^^^^ The **Base Model** providing a 128 context window and a maximum output of 1,024 tokens. **Support Platforms**: LLM630 Compute Kit, Module LLM, and Module LLM Kit - 128 context window - 1,024 max output tokens - ttft 1075.04ms - avg-token/s 3.57 Install """"""" .. code-block:: shell apt install llm-model-deepseek-r1-1.5b-ax630c **Manual installation:** `Click here to download llm-model-deepseek-r1-1.5b-ax630c `_ Long-Context Model ~~~~~~~~~~~~~~~~~~ deepseek-r1-1.5B-p256-ax630c ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The **Long-Context Model** Compared to the **Base Model**, it provides extended context capabilities, offering a 256 context window and a maximum of 1,024 output tokens. **Support Platforms**: LLM630 Compute Kit, Module LLM, Module LLM Kit - 256 context window - 1,024 max output tokens - ttft 3056.86ms - avg-token/s 3.57 Install """"""" .. code-block:: shell apt install llm-model-deepseek-r1-1.5b-p256-ax630c **Manual installation:** `Click here to download llm-model-deepseek-r1-1.5b-p256-ax630c `_