8 months ago

Object Detection

Depth Estimation

Convolutional Neural Network

Method/Architecture

Computer Vision

Shubham Shrivastava Punarjay Chakravarty

Abstract

We introduce a method for 3D object detection using a single monocular image.Starting from a synthetic dataset, we pre-train an RGB-to-Depth Auto-Encoder(AE). The embedding learnt from this AE is then used to train a 3D ObjectDetector (3DOD) CNN which is used to regress the parameters of 3D object posesafter the encoder from the AE generates a latent embedding from the RGB image.We show that we can pre-train the AE using paired RGB and depth images fromsimulation data once and subsequently only train the 3DOD network using realdata, comprising of RGB images and 3D object pose labels (without therequirement of dense depth). Our 3DOD network utilizes a particular`cubification' of 3D space around the camera, where each cuboid is tasked withpredicting N object poses, along with their class and confidence values. The AEpre-training and this method of dividing the 3D space around the camera intocuboids give our method its name - CubifAE-3D. We demonstrate results formonocular 3D object detection in the Autonomous Vehicle (AV) use-case with theVirtual KITTI 2 and the KITTI datasets.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Object Detection

Depth Estimation

Convolutional Neural Network

Method/Architecture

Computer Vision

Shubham Shrivastava Punarjay Chakravarty

Abstract

We introduce a method for 3D object detection using a single monocular image.Starting from a synthetic dataset, we pre-train an RGB-to-Depth Auto-Encoder(AE). The embedding learnt from this AE is then used to train a 3D ObjectDetector (3DOD) CNN which is used to regress the parameters of 3D object posesafter the encoder from the AE generates a latent embedding from the RGB image.We show that we can pre-train the AE using paired RGB and depth images fromsimulation data once and subsequently only train the 3DOD network using realdata, comprising of RGB images and 3D object pose labels (without therequirement of dense depth). Our 3DOD network utilizes a particular`cubification' of 3D space around the camera, where each cuboid is tasked withpredicting N object poses, along with their class and confidence values. The AEpre-training and this method of dividing the 3D space around the camera intocuboids give our method its name - CubifAE-3D. We demonstrate results formonocular 3D object detection in the Autonomous Vehicle (AV) use-case with theVirtual KITTI 2 and the KITTI datasets.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

CubifAE-3D: Monocular Camera Space Cubification for Auto-Encoder based 3D Object Detection | Papers | HyperAI