HyperAIHyperAI

Command Palette

Search for a command to run...

Vibe AIGC: 에이전트 조율을 통한 콘텐츠 생성의 새로운 패러다임

Jiaheng Liu Yuanxing Zhang Shihao Li Xinping Lei

초록

지난 10년간 생성형 인공지능(AI)의 발전 방향은 규모 법칙(scale laws)에 기반한 모델 중심적 패러다임에 의해 주도되어 왔다. 시각적 사실성 측면에서 큰 진보를 이뤘음에도 불구하고, 이 접근법은 현재의 단일 스텝(single-shot) 모델이 확률적이고 투명하지 못한 특성과 창작자의 고수준 의도 사이에 존재하는 근본적인 괴리, 즉 ‘의도-실행 갭( Intent-Execution Gap)’이라는 ‘사용성의 한계’에 부딪혔다. 본 논문에서는 ‘바이브 코딩(Vibe Coding)’의 영감을 받아, 에이전트 기반 조율(agentic orchestration)을 통한 콘텐츠 생성을 위한 새로운 패러다임인 Vibe AIGC를 제안한다. 이는 계층적 다중 에이전트 워크플로우의 자율적 통합을 의미한다. 이러한 패러다임 하에서 사용자의 역할은 전통적인 프롬프트 엔지니어링을 넘어선다. 사용자는 ‘바이브(Vibe)’를 제공하는 ‘지휘자(Commander)’로 진화하는데, 이는 미적 선호, 기능적 논리 등 고수준의 표현을 포괄한다. 중앙 집중식 메타 플래너(Meta-Planner)는 시스템 아키텍트의 역할을 수행하며, 이 ‘바이브’를 실행 가능하고 검증 가능하며 적응 가능한 에이전트 파이프라인으로 분해한다. 확률적 추론에서 논리적 조율로의 전환을 통해 Vibe AIGC는 인간의 상상력과 기계의 실행 사이의 격차를 해소한다. 우리는 이러한 전환은 인간-AI 협업 경제를 재정의할 것이며, AI를 취약한 추론 엔진이 아닌, 복잡하고 장기적 수준의 디지털 자산을 민주적으로 창출할 수 있는 체계 수준의 공학적 파트너로 전환시킬 것이라고 주장한다.

One-sentence Summary

Researchers from Nanjing University and Kuaishou Technology propose Vibe AIGC, a multi-agent orchestration framework that replaces stochastic generation with logical pipelines, enabling users to command complex outputs via high-level “Vibe” prompts—bridging intent-execution gaps and democratizing long-horizon digital creation.

Key Contributions

  • The paper identifies the "Intent-Execution Gap" as a critical limitation of current model-centric AIGC systems, where stochastic single-shot generation fails to align with users’ high-level creative intent, forcing reliance on inefficient prompt engineering.
  • It introduces Vibe AIGC, a new paradigm that replaces monolithic inference with hierarchical multi-agent orchestration, where a Commander provides a high-level “Vibe” and a Meta-Planner decomposes it into verifiable, adaptive workflows.
  • Drawing inspiration from Vibe Coding, the framework repositions AI as a system-level engineering partner, enabling scalable, long-horizon content creation by shifting focus from model scaling to intelligent agentic coordination.

Introduction

The authors leverage the emerging concept of Vibe Coding to propose Vibe AIGC, a new paradigm that shifts content generation from single-model inference to hierarchical multi-agent orchestration. Current AIGC tools face a persistent Intent-Execution Gap: users must manually engineer prompts to coax coherent outputs from black-box models, a process that’s stochastic, inefficient, and ill-suited for complex, long-horizon tasks like video production or narrative design. Prior approaches—whether scaling models or stitching together fixed workflows—fail to bridge this gap because they remain tool-centric and lack adaptive, verifiable reasoning. The authors’ main contribution is a system where users act as Commanders, supplying a high-level “Vibe” (aesthetic, functional, and contextual intent), which a Meta-Planner decomposes into executable, falsifiable agent pipelines. This moves AI from fragile inference engine to collaborative engineering partner, enabling scalable, intent-driven creation of complex digital assets.

Method

The authors leverage a hierarchical, intent-driven architecture to bridge the semantic gap between abstract creative directives and precise, executable media generation workflows. At the core of this system is the Meta Planner, which functions not as a content generator but as a system architect that translates natural language “Commander Instructions”—often laden with subjective “Vibe” signals such as “oppressive atmosphere” or “Hitchcockian suspense”—into structured, domain-aware execution plans. This transformation is enabled by tight integration with a Domain-Specific Expert Knowledge Base, which encodes professional heuristics, genre constraints, and algorithmic workflows. For instance, the phrase “Hitchcockian suspense” is deconstructed into concrete directives: dolly zoom camera movements, high-contrast lighting, dissonant musical intervals, and narrative pacing based on information asymmetry. This process externalizes implicit creative knowledge, mitigating the hallucinations and mediocrity common in general-purpose LLMs.

As shown in the figure below, the architecture operates across two primary layers: the Creative Layer and the Algorithmic Layer. The Creative Layer generates a macro-level SOP blueprint—encompassing script specification, storyboard drawing, and voice-over planning—based on the parsed intent. This blueprint is then propagated to the Algorithmic Layer, which dynamically constructs and configures a workflow graph composed of AI Agents, foundation models, and media processing modules. The system adapts its orchestration topology based on task complexity: a simple image generation may trigger a linear pipeline, while a full music video demands a graph incorporating script decomposition, consistent character generation, keyframe rendering, and post-production effects. Crucially, the Meta Planner also configures operational hyperparameters—such as sampling steps and denoising strength—to ensure industrial-grade fidelity.

Human-in-the-loop mechanisms are embedded throughout the pipeline, allowing for real-time refinements and corrections at both the creative and algorithmic levels. This closed-loop design ensures that the system remains responsive to evolving user intent while maintaining technical consistency. The Meta Planner’s reasoning is not static; it dynamically grows the workflow from the top down, perceiving the user’s “Vibe” in real time, disambiguating intent via expert knowledge, and ultimately producing a precise, executable workflow graph. This architecture represents a paradigm shift from fragmented, manual, or end-to-end black-box systems toward a unified, agentic, and semantically grounded framework for creative content generation.


AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp
Vibe AIGC: 에이전트 조율을 통한 콘텐츠 생성의 새로운 패러다임 | 문서 | HyperAI초신경