HyperAI

Modality Generator

Modality Generator (MG) is a key component in a multimodal learning system. Its main function is to generate outputs of different modalities, such as images, videos, or audio. In the context of multimodal models, the Modality Generator usually works with other components such as Modality Encoder (ME), Input Projector (IP), Large Model Backbone (LLM Backbone), and Output Projector (OP) to achieve the understanding and generation of multimodal data.

The specific implementation of the modality generator may include but is not limited to the following technologies or models:

  • Image Generation: Such as Stable Diffusion, which is an image generation technology based on diffusion model.
  • Video Generation: Such as Zeroscope, focusing on the generation of video content.
  • Audio Generation: Such as AudioLDM, used to generate audio signals.