Video Question Answering On Next Qa Efficient

1:1 Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
ViLA (3B, 4 frames)	74.4	ViLA: Efficient Video-Language Alignment for Video Question Answering
SeViLA (4 frames)	73.8	Self-Chained Image-Language Model for Video Localization and Question Answering

0 of 2 row(s) selected.