Search for a command to run...
STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding