HyperAIHyperAI
2 months ago

CounTR: Transformer-based Generalised Visual Counting

Liu, Chang ; Zhong, Yujie ; Zisserman, Andrew ; Xie, Weidi
CounTR: Transformer-based Generalised Visual Counting
Abstract

In this paper, we consider the problem of generalised visual object counting,with the goal of developing a computational model for counting the number ofobjects from arbitrary semantic categories, using arbitrary number of"exemplars", i.e. zero-shot or few-shot counting. To this end, we make thefollowing four contributions: (1) We introduce a novel transformer-basedarchitecture for generalised visual object counting, termed as CountingTransformer (CounTR), which explicitly capture the similarity between imagepatches or with given "exemplars" with the attention mechanism;(2) We adopt atwo-stage training regime, that first pre-trains the model with self-supervisedlearning, and followed by supervised fine-tuning;(3) We propose a simple,scalable pipeline for synthesizing training images with a large number ofinstances or that from different semantic categories, explicitly forcing themodel to make use of the given "exemplars";(4) We conduct thorough ablationstudies on the large-scale counting benchmark, e.g. FSC-147, and demonstratestate-of-the-art performance on both zero and few-shot settings.

CounTR: Transformer-based Generalised Visual Counting | Latest Papers | HyperAI