HyperAIHyperAI
11 days ago

Depthwise Separable Temporal Convolutional Network for Action Segmentation

{Heiko Neumann, Wolfgang Mader, Christian Jarvers, Basavaraj Hampiholi}
Abstract

Fine-grained temporal action segmentation in long,untrimmed RGB videos is a key topic in visual human-machine interaction. Recent temporal convolution basedapproaches either use encoder-decoder(ED) architecture ordilations with doubling factor in consecutive convolutionlayers to segment actions in videos. However ED networksoperate on low temporal resolution and the dilations in suc-cessive layers cause gridding artifacts problem. We proposedepthwise separable temporal convolution network (DS-TCN) that operates on full temporal resolution and with re-duced gridding effects. The basic component of DS-TCNis residual depthwise dilated block (RDDB). We explore thetrade-off between large kernels and small dilation rates us-ing RDDB. We show that our DS-TCN is capable of captur-ing long-term dependencies as well as local temporal cuesefficiently. Our evaluation on three benchmark datasets,GTEA, 50Salads, and Breakfast demonstrates that DS-TCNoutperforms the existing ED-TCN and dilation based TCNbaselines even with comparatively fewer parameters.

Depthwise Separable Temporal Convolutional Network for Action Segmentation | Latest Papers | HyperAI