Video To Image Affordance Grounding
"Video-to-image Affordance Grounding" is a subtask in the field of computer vision that aims to analyze the hand interaction regions in demonstration videos to generate corresponding operation heatmaps on target images and annotate specific operational actions (such as pressing, rotating, etc.). This task can accurately locate the operable parts of objects and their functions, which is of significant application value for robot manipulation, human-computer interaction, and augmented reality technologies.