HyperAIHyperAI

Command Palette

Search for a command to run...

General Object Foundation Model for Images and Videos at Scale

Junfeng Wu Yi Jiang Qihao Liu Zehuan Yuan Xiang Bai Song Bai

Abstract

We present GLEE in this work, an object-level foundation model for locatingand identifying objects in images and videos. Through a unified framework, GLEEaccomplishes detection, segmentation, tracking, grounding, and identificationof arbitrary objects in the open world scenario for various object perceptiontasks. Adopting a cohesive learning strategy, GLEE acquires knowledge fromdiverse data sources with varying supervision levels to formulate generalobject representations, excelling in zero-shot transfer to new data and tasks.Specifically, we employ an image encoder, text encoder, and visual prompter tohandle multi-modal inputs, enabling to simultaneously solve variousobject-centric downstream tasks while maintaining state-of-the-art performance.Demonstrated through extensive training on over five million images fromdiverse benchmarks, GLEE exhibits remarkable versatility and improvedgeneralization performance, efficiently tackling downstream tasks without theneed for task-specific adaptation. By integrating large volumes ofautomatically labeled data, we further enhance its zero-shot generalizationcapabilities. Additionally, GLEE is capable of being integrated into LargeLanguage Models, serving as a foundational model to provide universalobject-level information for multi-modal tasks. We hope that the versatilityand universality of our method will mark a significant step in the developmentof efficient visual foundation models for AGI systems. The model and code willbe released at https://glee-vision.github.io .


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp