HyperAIHyperAI
9 days ago

Dense Temporal Convolution Network for Sign Language Translation

{Dan Guo; Shuo Wang; Qi Tian;Meng Wang}
Dense Temporal Convolution Network for Sign Language Translation
Abstract

The sign language translation (SLT) which aims at translating a sign language video into natural language is weakly supervised given that there is no exact mapping relationship between visual actions and textual words in a sentence label.To align the sign language actions and translate them into the respective words automatically, this paper proposes a dense temporal convolution network, termed emph{DenseTCN} which captures the actions in hierarchical views. Within this network, a temporal convolution (TC) is designed to learn the short-term correlation among adjacent features and further extended to a dense hierarchical structure. In the $k^mathrm{th}$ TC layer, we integrate the outputs of all preceding layers together: (1) The TC in a deeper layer essentially has larger receptive fields, which captures long-term temporal context by the hierarchical content transition. (2) The integration addresses the SLT problem by different views, including embedded short-term and extended long-term sequential learning. Finally, we adopt the CTC loss and a fusion strategy to learn the feature-wise classification and generate the translated sentence. The experimental results on two popular sign language benchmarks, emph{i.e.} PHOENIX and USTC-ConSents, demonstrate the effectiveness of our proposed method in terms of various measurements.