Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

Hand Gesture Recognition (HGR) enables intuitive human-computer interactionsin various real-world contexts. However, existing frameworks often struggle tomeet the real-time requirements essential for practical HGR applications. Thisstudy introduces a robust, skeleton-based framework for dynamic HGR thatsimplifies the recognition of dynamic hand gestures into a static imageclassification task, effectively reducing both hardware and computationaldemands. Our framework utilizes a data-level fusion technique to encode 3Dskeleton data from dynamic gestures into static RGB spatiotemporal images. Itincorporates a specialized end-to-end Ensemble Tuner (e2eET) Multi-Stream CNNarchitecture that optimizes the semantic connections between datarepresentations while minimizing computational needs. Tested across fivebenchmark datasets (SHREC'17, DHG-14/28, FPHA, LMDHG, and CNR), the frameworkshowed competitive performance with the state-of-the-art. Its capability tosupport real-time HGR applications was also demonstrated through deployment onstandard consumer PC hardware, showcasing low latency and minimal resourceusage in real-world settings. The successful deployment of this frameworkunderscores its potential to enhance real-time applications in fields such asvirtual/augmented reality, ambient intelligence, and assistive technologies,providing a scalable and efficient solution for dynamic gesture recognition.