Robot Manipulation On Simpler Env

Variant Aggregation

Variant Aggregation-Move Near

Variant Aggregation-Open/Close Drawer

Variant Aggregation-Pick Coke Can

Visual Matching

Visual Matching-Move Near

Visual Matching-Open/Close Drawer

Visual Matching-Pick Coke Can

평가 결과

이 벤치마크에서 각 모델의 성능 결과

									Paper Title
SpatialVLA	0.688	0.717	0.362	0.895	0.719	0.696	0.593	0.810	SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
SoFar	0.676	0.740	0.297	0.907	0.749	0.917	0.403	0.923	SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
RT-2-X	0.661	0.792	0.353	0.823	0.606	0.779	0.250	0.787	RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RoboVLM	0.463	0.560	0.085	0.683	0.563	0.663	0.268	0.727	Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
TraceVLA	0.450	0.564	0.310	0.600	0.460	0.600	0.240	0.560	TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
OpenVLA	0.411	0.477	0.177	0.545	0.277	0.462	0.356	0.163	OpenVLA: An Open-Source Vision-Language-Action Model
RT-1-X	0.397	0.323	0.294	0.490	0.534	0.317	0.597	0.567	RT-1: Robotics Transformer for Real-World Control at Scale
Octo-Base	0.012	0.031	0.011	0.006	0.168	0.042	0.227	0.170	Octo: An Open-Source Generalist Robot Policy

0 of 8 row(s) selected.