HyperAI

MASSW Scientific Workflow Dataset

Date

8 months ago

Size

998.33 MB

Organization

Publish URL

github.com

The MASSW (Multi-Aspect Summarization of Scientific Workflows) dataset is a comprehensive text dataset focusing on summarizing various aspects of scientific workflows. It was jointly released in 2024 by researchers from the University of Michigan, Ann Arbor, Purdue University, and LG AI Research Institute.MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows".

MASSW contains more than 152k peer-reviewed publications from 17 top computer science conferences, covering a time span of the past 50 years. The core feature of this dataset is that it defines 5 key aspects of the scientific workflow: context, key ideas, methods, results, and expected impact. These aspects are used to extract and structure information from each publication to generate a structured summary. This process not only improves the accessibility of information, but also facilitates various downstream tasks and analyses.

MASSW.torrent
Seeding 1Downloading 0Completed 76Total Downloads 74
  • MASSW/
    • README.md
      1.69 KB
    • README.txt
      3.39 KB
      • data/
          • MASSW/
            • massw_metadata_v1.jsonl
              854.73 MB
            • massw_v1.tsv
              998.33 MB