HyperAIHyperAI

Command Palette

Search for a command to run...

Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning

and Jianfei Cai Hanwang Zhang Chongyang Gao Xu Yang

Abstract

When we humans tell a long paragraph about an image, we usuallyfirst implicitly compose a mental “script” and then comply with itto generate the paragraph. Inspired by this, we render the modernencoder-decoder based image paragraph captioning model suchability by proposing Hierarchical Scene Graph Encoder-Decoder(HSGED) for generating coherent and distinctive paragraphs. Inparticular, we use the image scene graph as the “script” to incorporate rich semantic knowledge and, more importantly, the hierarchical constraints into the model. Specifically, we design a sentencescene graph RNN (SSG-RNN) to generate sub-graph level topics,which constrain the word scene graph RNN (WSG-RNN) to generate the corresponding sentences. We propose irredundant attentionin SSG-RNN to improve the possibility of abstracting topics fromrarely described sub-graphs and inheriting attention in WSG-RNNto generate more grounded sentences with the abstracted topics,both of which give rise to more distinctive paragraphs. An efficientsentence-level loss is also proposed for encouraging the sequence ofgenerated sentences to be similar to that of the ground-truth paragraphs. We validate HSGED on Stanford image paragraph datasetand show that it not only achieves a new state-of-the-art 36.02CIDEr-D, but also generates more coherent and distinctive paragraphs under various metrics.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning | Papers | HyperAI