8 months ago

Image Generation

Visual Question Answering

Computer Vision

Roei Herzig Amir Bar Huijuan Xu Gal Chechik Trevor Darrell Amir Globerson

Abstract

Generating realistic images of complex visual scenes becomes challenging whenone wishes to control the structure of the generated images. Previousapproaches showed that scenes with few entities can be controlled using scenegraphs, but this approach struggles as the complexity of the graph (the numberof objects and edges) increases. In this work, we show that one limitation ofcurrent methods is their inability to capture semantic equivalence in graphs.We present a novel model that addresses these issues by learning canonicalgraph representations from the data, resulting in improved image generation forcomplex visual scenes. Our model demonstrates improved empirical performance onlarge scene graphs, robustness to noise in the input scene graph, andgeneralization on semantically equivalent graphs. Finally, we show improvedperformance of the model on three different benchmarks: Visual Genome, COCO,and CLEVR.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Image Generation

Visual Question Answering

Computer Vision

Roei Herzig Amir Bar Huijuan Xu Gal Chechik Trevor Darrell Amir Globerson

Abstract

Generating realistic images of complex visual scenes becomes challenging whenone wishes to control the structure of the generated images. Previousapproaches showed that scenes with few entities can be controlled using scenegraphs, but this approach struggles as the complexity of the graph (the numberof objects and edges) increases. In this work, we show that one limitation ofcurrent methods is their inability to capture semantic equivalence in graphs.We present a novel model that addresses these issues by learning canonicalgraph representations from the data, resulting in improved image generation forcomplex visual scenes. Our model demonstrates improved empirical performance onlarge scene graphs, robustness to noise in the input scene graph, andgeneralization on semantically equivalent graphs. Finally, we show improvedperformance of the model on three different benchmarks: Visual Genome, COCO,and CLEVR.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Learning Canonical Representations for Scene Graph to Image Generation | Papers | HyperAI