HyperAI

Video Captioning is a task in the field of computer vision that aims to automatically understand actions and events in videos to generate corresponding textual descriptions. This technology can efficiently retrieve video content through text, enhancing the accessibility and utilization efficiency of video data.

MSR-VTT

mPLUG-2