HyperAI

SocialMaze Logical Reasoning Benchmark Dataset

Date

19 days ago

Publish URL

huggingface.co

Categories

Download Help

SocialMaze is a social reasoning benchmark dataset that focuses on hidden role reasoning tasks in multi-agent interaction scenarios. It aims to evaluate the logical reasoning, deception detection, and multi-round dialogue understanding capabilities of large language models (LLMs) in complex social environments. It provides a standardized testing platform for studying the social reasoning capabilities of LLMs.

This dataset is designed around a hidden role reasoning game, simulating social scenarios involving deception and misjudgment:

Role settings:

  • Investigator: Always provide a truthful statement.
  • Criminal: May selectively lie to confuse the public.
  • Rumormonger: Thinks they are investigators, but their statements are randomly true or false.
  • Lunatic: Believes himself to be a criminal and makes random true or false statements.

Game flow:

Each game consists of 3 rounds of dialogue, and in each round all players publicly identify whether a certain player is the criminal. Player 1 (i.e. the model perspective) needs to infer the real criminal and his own true role (possibly one of the four above) based on the three rounds of dialogue records.

The core challenge is to distinguish true statements from random lies, deal with character self-perception biases (such as the false identity of rumor spreaders and lunatics), and gradually eliminate impossible options and lock in the only solution through logical contradictions or consistencies in multiple rounds of dialogue.