HyperAI

One-click Deployment of Llama 3.1 405B Model OpenAI Compatible API Service

Tutorial and Model Introduction

This tutorial is to deploy the Llama-3.1-405B-Instruct-AWQ-INT4 model using the OpenAI compatible API service, including text tutorials and video tutorials.

* Video Tutorial:[OpenBayes official tutorial] Quick deployment of Mistral-Large & Llama-3.1-405B super large models

This model is a 405B parameter size instruction tuned version of the Llama 3.1 series of large language models, and uses AWQ quantization technology to quantize the model's weights to INT4 precision, which helps to reduce the model size and increase the reasoning speed while maintaining performance. It is one of the largest open source models currently, supporting multi-language input and output, enhancing the versatility and applicability of the model, while introducing a longer context window to handle more complex tasks and conversations.

"OpenAI-compatible API" refers to an application programming interface (API) that follows the interface standards and specifications set by OpenAI, allowing developers to use these APIs to interact with large language models (such as OpenAI's GPT series models). This compatibility means that third-party developers can use the same request and response format as OpenAI to integrate similar functionality into their own applications. For example, if a developer builds a chatbot using OpenAI's API, they can easily switch to another service that also follows the OpenAI compatible API standard without making significant changes to their code.

Key features of the OpenAI-compatible API include:

  • Standardized requests: API requests follow OpenAI's format, including required parameters and structure.
  • Standardized responses: API responses also follow OpenAI’s format, making processing and parsing results consistent and predictable.
  • Functionality consistency: Provides similar functionality to OpenAI, such as text generation, translation, summarization, etc.
  • Easy to integrate: Developers can easily integrate these APIs into existing systems, leveraging familiar interfaces and patterns.

Text Tutorial

1. Clone and start the container in the upper right corner of the tutorial interface

The OpenAI-compatible API will automatically start all services after successful deployment, without any additional intervention.

2. Copy the API address to a new page and open it

You can see that a default 404 message is displayed.

3. Add an extra parameter '/v1/models' after the API address

You can see that the deployment information of the model is displayed.

4. At this point, you can connect to the model in any OpenAI compatible SDK. Here we take OpenWebUI as an example - use the local OpenWebUI to integrate this API

Start an OpenWebUI service locally, start an additional connection in "External Connections", fill in the API in "OpenAPI" and ➕ '/v1', there is no "API key" custom input here. Click Save in the lower right corner.

5. Deployment completed

You can see that the OpenWebUI interface already has the Llama-3.1-405b model. You can simply enter a message below to communicate with the large model.