MASSW Scientific Workflow Dataset
Date
Size
Publish URL
Categories
The MASSW (Multi-Aspect Summarization of Scientific Workflows) dataset is a comprehensive text dataset focusing on summarizing various aspects of scientific workflows. It was jointly released in 2024 by researchers from the University of Michigan, Ann Arbor, Purdue University, and LG AI Research Institute.MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows".
MASSW contains more than 152k peer-reviewed publications from 17 top computer science conferences, covering a time span of the past 50 years. The core feature of this dataset is that it defines 5 key aspects of the scientific workflow: context, key ideas, methods, results, and expected impact. These aspects are used to extract and structure information from each publication to generate a structured summary. This process not only improves the accessibility of information, but also facilitates various downstream tasks and analyses.