HyperAI

Model Introduction

Phi-3.5-vision-instruct is a multimodal model in the Phi-3.5 series released by Microsoft, designed for applications that process text and visual input. The model supports a context length of 128K and has undergone a rigorous fine-tuning and optimization process, making it suitable for widespread use in commercial and research fields in environments with limited memory or computing resources and high low-latency requirements. The Phi-3.5-vision-instruct model has extensive image understanding, optical character recognition (OCR), chart and table parsing, multi-image or video clip summarization, and other functions, making it very suitable for a variety of AI-driven applications. It shows significant performance improvements in benchmarks related to image and video processing. The architecture of the model includes a 4.2 billion parameter system that integrates an image encoder, connector, projector, and Phi-3 Mini language model. The training used 256 NVIDIA A100-80G GPUs, the training time was 6 days, and the training data included 500 billion tokens (visual and text).

The Phi-3.5-vision-instruct model scored 43.0 in Multimodal Multi-Image Understanding (MMMU), demonstrating its enhanced ability to handle complex image understanding tasks. In addition, the model was trained using high-quality educational data, synthetic data, and strictly screened public documents to ensure data quality and privacy.

This tutorial can be started using a single 4090 card.

How to run

1. 克隆并成功启动容器后，等待约 10s，将鼠标悬浮在「API 地址」处，拷贝链接到新标签页打开

2. 可以看到如下界面

3. 点击上传图片，选择模型，并输入问题，点击 Submit

4. 生成结果

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

One-click Deployment of Phi-3.5-vision-instruct

Model Introduction

How to run

Exchange and discussion

Build AI with AI

Hyper Newsletters

Command Palette

One-click Deployment of Phi-3.5-vision-instruct

Model Introduction

How to run

Exchange and discussion

Build AI with AI

Hyper Newsletters