1. Tutorial Introduction

DeepSeek-R1-Distill-Llama-70B is an open source large language model launched by DeepSeek in 2025, with a parameter scale of up to 70 billion. It is trained based on Llama3.3-70B-Instruct, and uses reinforcement learning and distillation technology to improve reasoning performance. It not only inherits the advantages of the Llama series models, but also further optimizes the reasoning ability on this basis, especially in mathematics, code and logical reasoning tasks. As a high-performance version of the DeepSeek series, it performs well in multiple benchmarks. In addition, this model is an inference-enhanced model provided by DeepSeek AI, which supports a variety of application scenarios, such as mobile devices and edge computing, online reasoning services, etc., to improve response speed and reduce operating costs. It has very powerful reasoning and decision-making capabilities. In the fields of advanced AI assistants, scientific research analysis, etc., it can provide extremely professional and in-depth analysis results. For example, in medical research, the 70B version can analyze a large amount of medical data and provide valuable reference for disease research.

本教程使用 Ollama + Open WebUI 部署 DeepSeek-R1-Distill-Qwen-70B 作为演示，算力资源采用「单卡 A6000」。

2. Operation steps

1. After starting the container, click the API address to enter the web interface (if "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 5 minutes and try again.) 2. After entering the web page, you can start a conversation with the model!

2. After entering the webpage, you can start a conversation with the model

Ollama + Open WebUI Deployment DeekSeek-R1-70B

Common conversation settings

1. Temperature

Controls the randomness of the output, generally in the range of 0.0-2.0 between.
Low value (such as 0.1): More certain, biased towards common words.
High value (such as 1.5): More random, potentially more creative but erratic content.

2. Top-k Sampling

Only from The k with the highest probability Sampling in words, excluding low-probability words.
k is small (e.g. 10): More certainty, less randomness.
k is large (e.g. 50): More diversity, more innovation.

3. Top-p Sampling (Nucleus Sampling, Top-p Sampling)

chooseThe word set with cumulative probability reaching p, the k value is not fixed.
Low value (such as 0.3): More certainty, less randomness.
High value (such as 0.9): More diversity, improved fluency.

4. Repetition Penalty

Controls text repetition, usually in 1.0-2.0 between.
High value (such as 1.5): Reduce repetition and improve readability.
Low value (such as 1.0): No penalty, may cause the model to repeat words and sentences.

5. Max Tokens (maximum generation length)

Restriction ModelMaximum number of tokens generated, to avoid excessively long output.
Typical range:50-4096(Depends on the specific model).

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓