HyperAIHyperAI

Command Palette

Search for a command to run...

il y a 4 heures
LLM
Agent

Si les LLMs possèdent des attributs proches de ceux des humains, Age of Empires II en possède également.

Adrian de Wynter

Résumé

De nombreuses recherches ont été menées sur les grands modèles de langage (LLMs) et les flux de travail agents pilotés par des LLMs. Toutefois, de nombreuses études dans ce domaine attribuent, postulent ou présupposent l’existence d’attributs anthropomorphiques généralisés à ces systèmes (par exemple, une moralité ou une compréhension du langage naturel). Notre objectif n’est pas de soutenir ou de rejeter l’existence de ces attributs, mais de souligner que ces conclusions pourraient être erronées. À cette fin, nous avons conçu et entraîné un réseau de neurones simple sur le jeu vidéo Age of Empires II, et nous constatons que toute entité évoluant dans un substrat suffisamment puissant, tel que des LEGO ou l’agglomération de Boston (Greater Boston Area), pourrait également présenter de tels attributs. Par conséquent, les prétendus attributs anthropomorphiques des LLMs ne sont pas empiriquement uniques : si certaines propriétés (par exemple, les réponses aux prompts) peuvent rester constantes, d’autres, comme l’interprétation de leur comportement perçu, peuvent varier selon le substrat. Ainsi, toute discussion fondée empiriquement nécessite des critères de mesure explicites ; sinon, l’interprétation est laissée à la seule représentation. Nous démontrons ensuite qu’affirmer l’existence ou l’absence de ces attributs au sein d’un système, indépendamment du substrat et de manière généralisée, conduit à des conclusions soit circulaires, soit dénuées d’information, quel que soit le point de vue de l’expérimentateur.

One-sentence Summary

Adrian de Wynter demonstrates that anthropomorphic attributes ascribed to large language models are empirically non-unique by training a simple neural network on Age of Empires II to show that perceived behaviour depends on the substrate, arguing that generalised assumptions lead to circular or uninformative conclusions and require explicit measurement criteria for empirically-grounded discussion.

Key Contributions

  • A simple neural network is built and trained on the videogame Age of Empires II to demonstrate that purported anthropomorphic attributes are empirically non-unique and vary with the underlying substrate.
  • A null assumption is proposed where experiments avoid presupposing anthropomorphic attributes to ensure conclusions remain sound and robust.
  • Perceived anthropomorphism varies heavily with interface presentation, indicating that many anthropomorphic measurements assess presentation rather than actual system behavior.

Introduction

Research evaluating Large Language Models frequently presupposes the existence of human-like attributes such as empathy or moral reasoning. This methodological approach often leads to circular or uninformative conclusions because the experimental design relies on the very assumption it seeks to test. To address this, the authors train a simple neural network within the video game Age of Empires II to demonstrate that anthropomorphic behavior can emerge in any sufficiently powerful substrate. They argue that perceived intelligence depends heavily on representation and propose a null assumption framework to facilitate rigorous experiments that do not inherently bias results toward or against anthropomorphism.

Dataset

  • Dataset Composition and Sources

    • The authors collected scientific articles by querying the Semantic Scholar API and retrieving papers from ArXiv.
    • The search query targeted titles matching 'agent lllm' within a specific timespan from 1 May 2024 to 1 May 2026.
    • The collection process implemented timeouts and backoff mechanisms to avoid overloading external services.
  • Filtering and Processing

    • An initial deduplication step removed entries based on exact title matches.
    • Semantic filtering was conducted using a calibrated LLM-as-a-judge, specifically GPT-5.2.
    • The filtering pipeline excluded works that were not scientific articles or did not feature LLMs as the central aspect of study.
    • Additional prompts classified document types and determined if LLMs were the subject of study for pre-annotation.
  • Dataset Size and Sampling

    • The authors randomly sampled a subset of 1,024 papers from the initial filtered pool.
    • The final curated dataset consists of 315 papers after applying all filtering and labeling rules.
  • Metadata and Labeling

    • Papers were labeled regarding human-like attributes, including assumptions, study focus, and conclusions.
    • Emergent properties claimed by the works were identified as a free-form list and manually normalized.
    • Labels indicate whether papers assumed, studied, or concluded that LLMs possess human-like attributes.
  • Usage and Ethics

    • The dataset supports a survey analysis of anthropomorphic assumptions in LLM research rather than model training.
    • No human subjects were used in the study, and the crawling was performed responsibly.
    • The labelled and anonymised dataset is available in the repository, while survey code remains unreleased due to licensing and ethics considerations.

Method

The authors establish the functional and Turing-completeness of Age of Empires II (AoE II) to demonstrate that any neural network can be implemented within the game's engine. This is achieved by constructing fundamental logic gates, specifically NAND gates, using in-game units and triggers. Building upon this foundation, the authors implement a perceptron, a fundamental building block of neural networks, using a bipolar 1-bit architecture that avoids floating-point arithmetic.

The perceptron implementation utilizes a bipolar representation where bits are mapped to {1,+1}\{-1, +1\}{1,+1} rather than standard binary, allowing for the representation of negative weights and biases necessary for learning. The core architecture consists of two parallel XNOR gates whose outputs are fed into an AND gate, which acts as the Heaviside step function h(z)h(z)h(z). In this specific configuration, the bias term is hardcoded into the AND gate logic to simplify the circuit.

As shown in the figure below:

To train the perceptron to learn the AND function, the authors adopt an ansatz-based training algorithm suitable for the constraints of the 1-bit hardware. The training circuit takes the true label ttt, the input vector xxx, and the current weights www as inputs. The process begins by computing the perceptron's output f(x)f(x)f(x) and comparing it with the true label ttt to determine the error ϵ\epsilonϵ. This error is calculated using an XOR operation, ϵ=XOR(f(x),t)\epsilon = \text{XOR}(f(x), t)ϵ=XOR(f(x),t).

The circuit then evaluates whether the weights need updating. If the error is non-zero, the weights are updated according to the rule ww+ηϵxw \leftarrow w + \eta \epsilon xww+ηϵx, where the learning rate η\etaη is set to 1. The implementation includes logic to compare the new weight set with the current one; if they are identical, the process interrupts, otherwise, it retries.

Refer to the framework diagram for the detailed circuit layout of this training algorithm:

This approach leverages the concurrency control provided by the bipolar bit representation, where each logical bit is represented by two physical rails (or goats in the game context) to manage signal timing and avoid race conditions. While this ansatz-based strategy is less sophisticated than standard gradient descent, it successfully demonstrates the capability to train a perceptron within the game's environment.

Experiment

The analysis examines the validity of measuring anthropomorphic attributes in LLMs, demonstrating that mechanistic analysis and substrate invariance do not inherently prevent circular reasoning unless assumptions are explicitly stated. A parallel corpus study validates the prevalence of these methodological issues, revealing that most papers assume human-like traits and frequently conclude their existence without independent verification. Consequently, the findings highlight a systemic reliance on accept or reject setups that often yield uninformative results regarding emergent capabilities.


Créer de l'IA avec l'IA

De l'idée au lancement — accélérez votre développement IA avec le co-codage IA gratuit, un environnement prêt à l'emploi et le meilleur prix pour les GPU.

Codage assisté par IA
GPU prêts à l’emploi
Tarifs les plus avantageux

HyperAI Newsletters

Abonnez-vous à nos dernières mises à jour
Nous vous enverrons les dernières mises à jour de la semaine dans votre boîte de réception à neuf heures chaque lundi matin
Propulsé par MailChimp