Learning to Count Anything: Reference-less Class-agnostic Counting with Weak Supervision

Current class-agnostic counting methods can generalise to unseen classes butusually require reference images to define the type of object to be counted, aswell as instance annotations during training. Reference-less class-agnosticcounting is an emerging field that identifies counting as, at its core, arepetition-recognition task. Such methods facilitate counting on a changing setcomposition. We show that a general feature space with global context canenumerate instances in an image without a prior on the object type present.Specifically, we demonstrate that regression from vision transformer featureswithout point-level supervision or reference images is superior to otherreference-less methods and is competitive with methods that use referenceimages. We show this on the current standard few-shot counting dataset FSC-147.We also propose an improved dataset, FSC-133, which removes errors,ambiguities, and repeated images from FSC-147 and demonstrate similarperformance on it. To the best of our knowledge, we are the firstweakly-supervised reference-less class-agnostic counting method.