Command Palette
Search for a command to run...
Datenjournalist Agent: Transformieren von Daten in überprüfbare multimodale Geschichten
Datenjournalist Agent: Transformieren von Daten in überprüfbare multimodale Geschichten
Kevin Qinghong Lin Batu EI Yuhong Shi Pan Lu Philip Torr James Zou
Zusammenfassung
Daten erzählen Geschichten, die die Gesellschaft prägen; die Aufgabe des Datenjournalisten besteht darin, rohe Informationen in Geschichten umzuwandeln, denen Nicht-Experten vertrauen können. Ein hochwertiger Nachrichtenbeitrag erfordert ein Redaktionsteam wochenlang: die Suche nach Kontext, die Durchführung statistischer Auswertungen, die Auswahl eines journalistischen Winkels und die Gestaltung von Visualisierungen. Aktuelle agents bewältigen einzelne Schritte gut: data-science agents schließen die Analysekette, während design agents ästhetische Websites generieren. Doch kann ein agent den gesamten Prozess eines Datenjournalisten von Anfang bis Ende abdecken? Wir stellen den Data Journalist Agent (Data2Story) vor, ein multi-agent-Rahmenwerk, das spezialisierte Rollen in einem einzigen virtuellen Redaktionsteam orchestriert. Data2Story stellt zwei Innovationen vor. (i) Behauptungen sind evidenzverankert: Ein Inspector verknüpft jede Zahl, jeden journalistischen Winkel und jedes Asset mit den zugrunde liegenden Daten, dem Code oder einer externen Referenz. (ii) Artikel werden multimodal generativ erstellt: Anstatt standardmäßig auf reinen Text und statische Diagramme zurückzugreifen, leitet Data2Story aus der Perspektive der Leser ab, was diese sehen möchten, und setzt anschließend multimodale Werkzeuge ein, etwa interaktive Karten für geografische Darstellungen oder Audioformate für Musik. Wir evaluieren Data2Story anhand von 18 Artikeln, die jeweils mit dem ursprünglich veröffentlichten Expertenbeitrag verglichen werden, entlang vier Dimensionen: (a) die Abdeckung des journalistischen Blickwinkels durch Mensch und agent; (b) eine Bewertung mittels eines Rubriksystems mit 53 Teilnehmern über fünf Dimensionen; (c) computer-use agents als Gutachter, die als kosteneffiziente Stellvertretung dafür dienen, wie Leser interaktive Artikel navigieren; und (d) die Verifizierbarkeit, bei der ein Code-Verifier Aussagen erneut gegen die Daten ausführt und Behauptungen mit Referenzen abgleicht. Data2Story erzeugt wettbewerbsfähige, multimediale Geschichten, die sich durch eine lückenlose Nachverfolgbarkeit der Evidenz auszeichnen, wobei besondere Stärken in Transparenz und Auditierbarkeit liegen. Von Menschen verfasste Artikel behalten jedoch einen Vorsprung in der journalistischen Ausrichtung, im kreativen Design und in der Präsentation. Wir positionieren Data2Story als Unterstützung für Journalisten, die eine stärker evidenzbasierte, transparente und überprüfbare Berichterstattung ermöglicht. Der Quellcode sowie Demoversionen stehen unter https://data2story.github.io zur Verfügung.
One-sentence Summary
Data2Story is a multi-agent framework that orchestrates specialized roles into a virtual newsroom to produce end-to-end data journalism, featuring an Inspector module for evidence-grounded claims and audience-tailored multimodal generation, and it yields competitive, transparent narratives across 18 articles evaluated through human-agent angle coverage, a 53-participant rubric study, computer-use agent navigation, and automated claim verification, ultimately functioning as a verifiable supplement to human reporting.
Key Contributions
- Data2Story is a multi-agent framework that orchestrates specialized roles to automatically generate complete multimedia news articles from raw data. An Inspector agent explicitly links all numerical claims, visual assets, and narrative angles to verifiable sources including raw datasets, executable code, or external URLs.
- A Designer agent dynamically generates topic-specific multimedia elements such as interactive maps and playable simulations by reasoning about audience preferences. This multimodal generation capability ensures the final output aligns with both the data subject matter and the intended readership.
- The framework is evaluated across 18 diverse articles paired with expert-written counterparts using human rubric scoring, computer-use agent navigation proxies, and automated coding verification. The system produces competitive, evidence-traceable stories with superior transparency and claim-level auditability, while human journalists retain an advantage in editorial angle and creative design.
Method
The authors introduce a multi-agent framework termed the Virtual Newsroom, which automates the end-to-end process of data journalism. As illustrated in the overview below, the system transforms raw data into a narrative story enriched with multimodal elements through an intelligent agent pipeline.
The detailed architecture is presented in the framework diagram below. The pipeline consists of several specialized agent roles. The process begins with a raw dataset D, which is processed by the Detective agent. The Detective augments the raw data with external context obtained via web search, creating an enriched corpus D∪D. Next, the Analyst agent writes Python code to perform statistical analysis on the enriched data, generating a set of results R and corresponding scripts C. The Editor agent then reviews these findings to construct an editorial plan and a prose outline, producing a set of findings F.
The Designer agent creates multimedia assets V, such as images, videos, or interactive widgets, to complement the narrative. The Programmer agent then assembles these artifacts into a final HTML page U. If the Auditor agent detects visual or structural defects in the rendered page, it provides revision suggestions S, which the Programmer uses to refine the output.
To ensure the verifiability of the generated content, the system employs an Inspector module. As shown in the figure below, the Inspector binds every element of the final article back to its supporting evidence. It aggregates atomic units of evidence from upstream agents, including context items D, results R, code C, findings F, and visual specifications V. The Inspector decomposes the final HTML page into fragments and links each fragment to the specific code line or external reference that grounds it. This creates a traceable evidence chain, allowing readers to verify claims by following links to the original data, code, or source material.
Experiment
The evaluation compares Data2Story-generated multimedia articles against human-written references from diverse publications using human readers, a computer-use agent proxy, and an automated provenance verifier to assess narrative quality, judge alignment, and traceability. Results indicate that the system reliably captures straightforward analytical angles and consistently outperforms human baselines in transparency and claim-data alignment, though it struggles to fully replicate highly creative editorial storytelling. The agent-as-judge protocol successfully mirrors human preferences at a fraction of the cost, while the built-in Inspector module proves essential for establishing machine-auditable evidence trails. Ultimately, Data2Story demonstrates that automated agents can effectively bridge data analysis and data journalism by producing verifiable, multimedia-rich narratives that meet professional standards.
The evaluation indicates that agent-generated articles, particularly those utilizing the Inspector feature, consistently achieve higher average scores than human-written counterparts across all tested categories. The inclusion of the Inspector component results in a significant performance boost compared to the agent's output without it. The most substantial advantage is observed in the TidyTuesday category, while the Pudding category shows the smallest performance gap between the agent and human baselines. Agent-generated articles consistently outperform human-written articles across all evaluated categories. The inclusion of the Inspector feature leads to a notable improvement in performance compared to the version without it. The performance gap is widest for TidyTuesday articles, whereas Pudding articles show results closer to human baselines.
The evaluation demonstrates that the proposed agent outperforms human-written articles across all assessed rubric dimensions, with the most substantial gains in transparency and claim alignment. While the agent shows a clear advantage in analytical genres, its performance in highly designed editorial styles is comparable to human baselines. Overall, human reviewers expressed a strong preference for the agent's output over human references. The agent achieves higher mean scores than human authors across all five rubric dimensions, particularly excelling in transparency and claim alignment. Performance gaps are wider for analytical sources like economics and community datasets compared to highly curated editorial pieces where results are similar. A significant majority of reviewers preferred the agent-generated articles in pairwise comparisons, aligning with the quantitative rubric scores.
The human evaluation demonstrates that the agent-generated articles outperform human-written counterparts across all assessed quality dimensions. The most significant advantage appears in transparency and claim-data alignment, while visual design shows the smallest gap. When broken down by publication style, the agent maintains a clear lead in analytical and community-driven sources but matches human performance in highly curated, long-form editorial pieces. Overall, a majority of reviewers preferred the agent's work, aligning with the dimensional scoring trends. The agent surpasses human authors on every rubric dimension, with transparency showing the largest advantage. Performance varies by source type, excelling in analytical formats but tying with humans in highly designed editorial pieces. Holistic reader preference strongly favors the agent, consistent with the per-dimension evaluation results.
The study evaluates the textual composition and claim coverage of an AI agent against human data journalists. Results indicate that the agent writes in a more granular style, utilizing a higher volume of shorter sentences. In terms of content alignment, the agent effectively captures a majority of the claims presented in human-written articles, though it also generates a significant number of unique claims not found in the human references. The agent generates articles with a higher frequency of shorter sentences compared to human counterparts. The agent successfully covers a substantial portion of the claims made by human journalists across different publication sources. Coverage gaps vary by source, with the agent capturing the most human claims in brief-style articles while introducing many unique insights.
The evaluation compares AI-generated articles against human-written data journalism pieces across multiple quality rubrics, publication formats, and stylistic analyses to validate the agent's compositional accuracy, transparency, and overall journalistic quality. Results demonstrate that the agent consistently surpasses human baselines across all assessed dimensions, with the Inspector feature driving notable improvements in claim alignment and analytical rigor. While the agent excels in data-driven and community-focused formats, it closely matches human performance in highly curated editorial styles. Overall, reviewers strongly preferred the agent's output, which features a more granular writing style, comprehensive claim coverage, and the generation of valuable unique insights.