2 months ago

DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection

Kumar, Abhinav ; Brazil, Garrick ; Corona, Enrique ; Parchami, Armin ; Liu, Xiaoming

Abstract

Modern neural networks use building blocks such as convolutions that areequivariant to arbitrary 2D translations. However, these vanilla blocks are notequivariant to arbitrary 3D translations in the projective manifold. Even then,all monocular 3D detectors use vanilla blocks to obtain the 3D coordinates, atask for which the vanilla blocks are not designed for. This paper takes thefirst step towards convolutions equivariant to arbitrary 3D translations in theprojective manifold. Since the depth is the hardest to estimate for monoculardetection, this paper proposes Depth EquiVarIAnt NeTwork (DEVIANT) built withexisting scale equivariant steerable blocks. As a result, DEVIANT isequivariant to the depth translations in the projective manifold whereasvanilla networks are not. The additional depth equivariance forces the DEVIANTto learn consistent depth estimates, and therefore, DEVIANT achievesstate-of-the-art monocular 3D detection results on KITTI and Waymo datasets inthe image-only category and performs competitively to methods using extrainformation. Moreover, DEVIANT works better than vanilla networks incross-dataset evaluation. Code and models athttps://github.com/abhi1kumar/DEVIANT