Command Palette
Search for a command to run...
Agent Journaliste de Données : Transformer les données en récits multimodaux vérifiables
Agent Journaliste de Données : Transformer les données en récits multimodaux vérifiables
Kevin Qinghong Lin Batu EI Yuhong Shi Pan Lu Philip Torr James Zou
Résumé
Les données racontent des récits qui façonnent la société ; le rôle du journaliste de données consiste à transformer des informations brutes en histoires dignes de confiance pour les non-experts. La réalisation d'un article de fond de haute qualité prend plusieurs semaines à une équipe de rédaction : recherche de contexte, exécution de statistiques, choix d'un angle et conception de visuels. Les agents récents traitent efficacement les étapes individuelles : les agents de data science ferment la boucle d'analyse, tandis que les agents de conception synthétisent de magnifiques sites web. Mais un agent peut-il assurer le rôle de journaliste de données de bout en bout ? Nous présentons Data Journalist Agent (Data2Story), un cadre multi-agent qui orchestre des rôles spécialisés au sein d'une unique salle de rédaction virtuelle. Data2Story apporte deux innovations. (i) Les affirmations sont ancrées dans les preuves : un Inspecteur relie chaque nombre, angle et actif aux données, au code ou à une référence externe. (ii) Les articles sont génératifs de manière multimodale : plutôt que de se limiter par défaut au texte brut et aux graphiques statiques, Data2Story raisonne sur ce que les lecteurs souhaitent voir, puis déploie des outils multimodaux, tels que des cartes interactives pour la géographie et l'audio pour la musique. Nous évaluons Data2Story sur 18 articles, chacun étant associé à la pièce experte initialement publiée, selon quatre axes : (a) couverture de l'angle humain-agent ; (b) évaluation par grille avec 53 participants sur cinq dimensions ; (c) utilisation d'agents d'utilisation d'ordinateur comme juges, un proxy économique pour la manière dont les lecteurs naviguent dans les articles interactifs ; et (d) vérifiabilité, où un vérificateur de code réexécute les déclarations par rapport aux données et vérifie les affirmations par rapport aux références. Data2Story produit des récits multimédias compétitifs et traçables par leurs preuves, avec une
One-sentence Summary
Data2Story is a multi-agent framework that orchestrates specialized roles into a virtual newsroom to produce end-to-end data journalism, featuring an Inspector module for evidence-grounded claims and audience-tailored multimodal generation, and it yields competitive, transparent narratives across 18 articles evaluated through human-agent angle coverage, a 53-participant rubric study, computer-use agent navigation, and automated claim verification, ultimately functioning as a verifiable supplement to human reporting.
Key Contributions
- Data2Story is a multi-agent framework that orchestrates specialized roles to automatically generate complete multimedia news articles from raw data. An Inspector agent explicitly links all numerical claims, visual assets, and narrative angles to verifiable sources including raw datasets, executable code, or external URLs.
- A Designer agent dynamically generates topic-specific multimedia elements such as interactive maps and playable simulations by reasoning about audience preferences. This multimodal generation capability ensures the final output aligns with both the data subject matter and the intended readership.
- The framework is evaluated across 18 diverse articles paired with expert-written counterparts using human rubric scoring, computer-use agent navigation proxies, and automated coding verification. The system produces competitive, evidence-traceable stories with superior transparency and claim-level auditability, while human journalists retain an advantage in editorial angle and creative design.
Method
The authors introduce a multi-agent framework termed the Virtual Newsroom, which automates the end-to-end process of data journalism. As illustrated in the overview below, the system transforms raw data into a narrative story enriched with multimodal elements through an intelligent agent pipeline.
The detailed architecture is presented in the framework diagram below. The pipeline consists of several specialized agent roles. The process begins with a raw dataset D, which is processed by the Detective agent. The Detective augments the raw data with external context obtained via web search, creating an enriched corpus D∪D. Next, the Analyst agent writes Python code to perform statistical analysis on the enriched data, generating a set of results R and corresponding scripts C. The Editor agent then reviews these findings to construct an editorial plan and a prose outline, producing a set of findings F.
The Designer agent creates multimedia assets V, such as images, videos, or interactive widgets, to complement the narrative. The Programmer agent then assembles these artifacts into a final HTML page U. If the Auditor agent detects visual or structural defects in the rendered page, it provides revision suggestions S, which the Programmer uses to refine the output.
To ensure the verifiability of the generated content, the system employs an Inspector module. As shown in the figure below, the Inspector binds every element of the final article back to its supporting evidence. It aggregates atomic units of evidence from upstream agents, including context items D, results R, code C, findings F, and visual specifications V. The Inspector decomposes the final HTML page into fragments and links each fragment to the specific code line or external reference that grounds it. This creates a traceable evidence chain, allowing readers to verify claims by following links to the original data, code, or source material.
Experiment
The evaluation compares Data2Story-generated multimedia articles against human-written references from diverse publications using human readers, a computer-use agent proxy, and an automated provenance verifier to assess narrative quality, judge alignment, and traceability. Results indicate that the system reliably captures straightforward analytical angles and consistently outperforms human baselines in transparency and claim-data alignment, though it struggles to fully replicate highly creative editorial storytelling. The agent-as-judge protocol successfully mirrors human preferences at a fraction of the cost, while the built-in Inspector module proves essential for establishing machine-auditable evidence trails. Ultimately, Data2Story demonstrates that automated agents can effectively bridge data analysis and data journalism by producing verifiable, multimedia-rich narratives that meet professional standards.
The evaluation indicates that agent-generated articles, particularly those utilizing the Inspector feature, consistently achieve higher average scores than human-written counterparts across all tested categories. The inclusion of the Inspector component results in a significant performance boost compared to the agent's output without it. The most substantial advantage is observed in the TidyTuesday category, while the Pudding category shows the smallest performance gap between the agent and human baselines. Agent-generated articles consistently outperform human-written articles across all evaluated categories. The inclusion of the Inspector feature leads to a notable improvement in performance compared to the version without it. The performance gap is widest for TidyTuesday articles, whereas Pudding articles show results closer to human baselines.
The evaluation demonstrates that the proposed agent outperforms human-written articles across all assessed rubric dimensions, with the most substantial gains in transparency and claim alignment. While the agent shows a clear advantage in analytical genres, its performance in highly designed editorial styles is comparable to human baselines. Overall, human reviewers expressed a strong preference for the agent's output over human references. The agent achieves higher mean scores than human authors across all five rubric dimensions, particularly excelling in transparency and claim alignment. Performance gaps are wider for analytical sources like economics and community datasets compared to highly curated editorial pieces where results are similar. A significant majority of reviewers preferred the agent-generated articles in pairwise comparisons, aligning with the quantitative rubric scores.
The human evaluation demonstrates that the agent-generated articles outperform human-written counterparts across all assessed quality dimensions. The most significant advantage appears in transparency and claim-data alignment, while visual design shows the smallest gap. When broken down by publication style, the agent maintains a clear lead in analytical and community-driven sources but matches human performance in highly curated, long-form editorial pieces. Overall, a majority of reviewers preferred the agent's work, aligning with the dimensional scoring trends. The agent surpasses human authors on every rubric dimension, with transparency showing the largest advantage. Performance varies by source type, excelling in analytical formats but tying with humans in highly designed editorial pieces. Holistic reader preference strongly favors the agent, consistent with the per-dimension evaluation results.
The study evaluates the textual composition and claim coverage of an AI agent against human data journalists. Results indicate that the agent writes in a more granular style, utilizing a higher volume of shorter sentences. In terms of content alignment, the agent effectively captures a majority of the claims presented in human-written articles, though it also generates a significant number of unique claims not found in the human references. The agent generates articles with a higher frequency of shorter sentences compared to human counterparts. The agent successfully covers a substantial portion of the claims made by human journalists across different publication sources. Coverage gaps vary by source, with the agent capturing the most human claims in brief-style articles while introducing many unique insights.
The evaluation compares AI-generated articles against human-written data journalism pieces across multiple quality rubrics, publication formats, and stylistic analyses to validate the agent's compositional accuracy, transparency, and overall journalistic quality. Results demonstrate that the agent consistently surpasses human baselines across all assessed dimensions, with the Inspector feature driving notable improvements in claim alignment and analytical rigor. While the agent excels in data-driven and community-focused formats, it closely matches human performance in highly curated editorial styles. Overall, reviewers strongly preferred the agent's output, which features a more granular writing style, comprehensive claim coverage, and the generation of valuable unique insights.