HyperAI超神经

Mind2Web

评估指标

cross-domain_ele. acc
cross-domain_op. f1
cross-domain_sr
cross-domain_step sr
cross-task_ele. acc
cross-task_op. f1
cross-task_sr
cross-task_step sr
cross-website_ele. acc
cross-website_op. f1
cross-website_sr
cross-website_step sr
llm_model
model_url
organization
parameters
release_date
updated_time

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称cross-domain_ele. acccross-domain_op. f1cross-domain_srcross-domain_step srcross-task_ele. acccross-task_op. f1cross-task_srcross-task_step srcross-website_ele. acccross-website_op. f1cross-website_srcross-website_step srllm_modelmodel_urlorganizationparametersrelease_dateupdated_time
模型 133.967.31.631.643.676.84.041.032.167.61.729.5Flan-T5Bhttps://huggingface.co/docs/transformers/model_doc/flan-t5GoogleN/A2022.10.202023.11.9