HyperAIHyperAI

Command Palette

Search for a command to run...

SocialMaze Logical Reasoning Benchmark Dataset

Date

6 months ago

Size

169.48 MB

SocialMaze is a social reasoning benchmark dataset that focuses on hidden role reasoning tasks in multi-agent interaction scenarios. It aims to evaluate the logical reasoning, deception detection, and multi-round dialogue understanding capabilities of large language models (LLMs) in complex social environments. It provides a standardized testing platform for studying the social reasoning capabilities of LLMs.

This dataset is designed around a hidden role reasoning game, simulating social scenarios involving deception and misjudgment:

Role settings:

  • Investigator: Always provide a truthful statement.
  • Criminal: May selectively lie to confuse the public.
  • Rumormonger: Thinks they are investigators, but their statements are randomly true or false.
  • Lunatic: Believes himself to be a criminal and makes random true or false statements.

Game flow:

Each game consists of 3 rounds of dialogue, and in each round all players publicly identify whether a certain player is the criminal. Player 1 (i.e. the model perspective) needs to infer the real criminal and his own true role (possibly one of the four above) based on the three rounds of dialogue records.

The core challenge is to distinguish true statements from random lies, deal with character self-perception biases (such as the false identity of rumor spreaders and lunatics), and gradually eliminate impossible options and lock in the only solution through logical contradictions or consistencies in multiple rounds of dialogue.

SocialMaze.torrent
Seeding 1Downloading 0Completed 48Total Downloads 99
  • SocialMaze/
    • README.md
      1.89 KB
    • README.txt
      3.79 KB
      • data/
        • SocialMaze.zip
          169.48 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SocialMaze Logical Reasoning Benchmark Dataset | Datasets | HyperAI