HyperAI超神経

Conversational Web Navigation On Weblinx

評価指標

Element (IoU)
Intent Match
Overall score
Text (F1)

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名
Element (IoU)
Intent Match
Overall score
Text (F1)
Paper TitleRepository
GPT-3.5T (Zero-Shot)8.6242.778.513.45WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
S-LLaMA-1.3B20.5483.3223.7325.85WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Pix2Act-1.3B8.2881.8016.8825.21WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
MindAct-3B16.5079.8920.9423.16WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Fuyu-8B15.7080.0719.9722.30WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Llama-2-13B22.8281.9125.2126.60WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
GPT-3.5F18.6477.5621.2222.39WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
MindAct-780M13.3975.8715.1313.58WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Flan-T5-780M15.3680.0217.2714.05WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
MindAct-250M12.0574.2512.637.67WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Pix2Act-282M6.2079.7112.5116.40WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
S-LLaMA-2.7B22.6084.0025.0227.17WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
GPT-4T (Zero-Shot)10.8541.6610.726.75WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Flan-T5-250M14.8679.6914.999.21WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Flan-T5-3B20.3181.1423.7725.75WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
GPT-4V (Zero-Shot)10.9142.3610.456.21WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Llama-2-7B22.2682.6424.5726.50WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
0 of 17 row(s) selected.