Mind2Web
평가 지표
cross-domain_ele. acc
cross-domain_op. f1
cross-domain_sr
cross-domain_step sr
cross-task_ele. acc
cross-task_op. f1
cross-task_sr
cross-task_step sr
cross-website_ele. acc
cross-website_op. f1
cross-website_sr
cross-website_step sr
llm_model
model_url
organization
parameters
release_date
updated_time
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | cross-domain_ele. acc | cross-domain_op. f1 | cross-domain_sr | cross-domain_step sr | cross-task_ele. acc | cross-task_op. f1 | cross-task_sr | cross-task_step sr | cross-website_ele. acc | cross-website_op. f1 | cross-website_sr | cross-website_step sr | llm_model | model_url | organization | parameters | release_date | updated_time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
모델 1 | 33.9 | 67.3 | 1.6 | 31.6 | 43.6 | 76.8 | 4.0 | 41.0 | 32.1 | 67.6 | 1.7 | 29.5 | Flan-T5B | https://huggingface.co/docs/transformers/model_doc/flan-t5 | N/A | 2022.10.20 | 2023.11.9 |