vLLM+Open WebUI Deployment Phi-4-mini-flash-reasoning
1. Tutorial Introduction

Phi-4-mini-flash-reasoning is a lightweight open-source model released by the Microsoft team. It is built on synthetic data, focuses on high-quality, dense inference data, and is further fine-tuned to achieve more advanced mathematical reasoning capabilities. This model belongs to the Phi-4 model family, supports 64K token context length, adopts a decoder-hybrid-decoder architecture, combines attention mechanism and state-space model (SSM), and performs well in inference efficiency. Related papers are "Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation".
This tutorial uses a single RTX 4090 card. Project prompts support Chinese and English.
2. Project Examples

3. Operation steps
1. After starting the container, click the API address to enter the Web interface

2. Usage steps
If "Model" is not displayed, it means the model is initializing. Since the model is large, please wait about 1-3 minutes and refresh the page.

4. Discussion
🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information
The citation information for this project is as follows:
@software{archscale2025, title={ArchScale: Simple and Scalable Pretraining for Neural Architecture Research}, author={Liliang Ren and Zichong Li and Yelong Shen}, year={2025}, url={https://github.com/microsoft/ArchScale} }
@article{ren2025decoder,
title={Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation},
author={Liliang Ren and Congcong Chen and Haoran Xu and Young Jin Kim and Adam Atkinson and Zheng Zhan and Jiankai Sun and Baolin Peng and Liyuan Liu and Shuohang Wang and Hao Cheng and Jianfeng Gao and Weizhu Chen and Yelong Shen},
journal={arXiv preprint arXiv:2507.06607},
year={2025}
}