HyperAI

Describe Anything Model Demo

Project Overview

GitHub Stars

Describe Anything Model (DAM) is an innovative image and video description model jointly developed by NVIDIA, UC Berkeley, and UCSF teams and released in 2025. The model can generate detailed descriptions based on user-specified areas (points, boxes, scribbles, or masks). For video content, you only need to annotate the area on any frame to get a complete description. The related paper results are "Describe Anything: Detailed Localized Image and Video Captioning".

This tutorial uses resources for a single RTX 4090 card.

Project Examples

Project Examples

Run steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 1-2 minutes and refresh the page.

2. Once you enter the web page, you can interact with the model

The image size should not exceed 5 MB, the video length should not exceed 20 seconds, and the video size should not exceed 5 MB, otherwise it may cause the model to run slowly or report an error. Please select the area for description reasonably.

This tutorial provides two module tests: image mode and video mode modules.

The functions of each module are as follows:

Image Mode

Video Mode

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

Citation Information

Thanks to Github user zhangjunchang  For the deployment of this tutorial, the project reference information is as follows:

@article{lian2025describe,
  title={Describe Anything: Detailed Localized Image and Video Captioning}, 
  author={Long Lian and Yifan Ding and Yunhao Ge and Sifei Liu and Hanzi Mao and Boyi Li and Marco Pavone and Ming-Yu Liu and Trevor Darrell and Adam Yala and Yin Cui},
  journal={arXiv preprint arXiv:2504.16072},
  year={2025}
} GitHub Stars arXiv