HyperAIHyperAI

Command Palette

Search for a command to run...

vor 5 Stunden
LLM
Textgenerierung

Unsicherheitsabschätzung basierend auf großen Sprachmodellen für die Bewertung von Lageindikatoren in sozialen Medien zur Krisenberichterstattung

Timothy Douglas Roben Delos Reyes Asanobu Kitamoto

Zusammenfassung

Soziale Medien haben sich im Katastrophenfall zu einer entscheidenden Quelle für situatives Bewusstsein entwickelt, indem sie Echtzeit-Einblicke in sich entwickelnde Auswirkungen und entstehende Bedürfnisse ermöglichen. Um Krisenreaktionen im großen Maßstab zu unterstützen, setzen neuere Arbeiten zunehmend Large Language Models (LLMs) ein, um situative Informationen aus sozialen Medien automatisch zu klassifizieren und zusammenzufassen. Bestehende Ansätze gehen jedoch implizit davon aus, dass extrahierte situative Aussagen gleichermaßen plausibel sind, obwohl die Informationsqualität im Verlauf einer Krise erheblich variiert. In dieser Arbeit schlagen wir einen unsicherheitsbewussten Rahmen für die automatische Erstellung situativer Berichterstattung vor, der die Plausibilität sozialer Medienbeiträge explizit berücksichtigt. Zunächst klassifizieren wir soziale Medienbeiträge nach einem etablierten Schema für situationsbezogenes Bewusstsein. Zweitens führen wir eine Unsicherheitsbewertungsschicht ein, die beurteilt, ob einzelne situative Aussagen unter Berücksichtigung externer Proxy-Daten realistische Bedingungen glaubwürdig widerspiegeln und dabei die Zuversicht des Modells in diese Einschätzung explizit ermittelt. Drittens nutzen wir diese Unsicherheitsbewertungen, um Krisenberichte zu generieren, die nicht nur vermitteln, was gemeldet wird, sondern auch, wie sicher diese Meldungen sind. Wir wenden diesen Rahmen auf über 200.000 erdbebenbezogene Twitter/X-Beiträge an und verwenden dazu Impact-Zusammenfassungen des USGS PAGER als repräsentative externe Proxy-Daten. Wir argumentieren, dass die explizite Darstellung von Unsicherheiten menschliche Krisenkommunikatoren bei der Priorisierung von Informationen unter Zeitdruck unterstützt und einen Rahmen für die Integration externer Proxy-Daten in LLM-basierte situative Bewusstseins-Pipelines bietet.

One-sentence Summary

Addressing the assumption that all extracted social media claims are equally reliable, the authors propose an LLM-based uncertainty-aware framework that classifies posts using a situational awareness schema, evaluates claim plausibility against external proxy data (USGS PAGER impact summaries) while explicitly eliciting model confidence, and generates crisis reports communicating both reported events and their associated certainty across over 200,000 earthquake-related Twitter/X posts.

Key Contributions

  • This work presents an uncertainty-aware framework for automated situational awareness reporting that explicitly evaluates the plausibility of social media claims rather than assuming uniform reliability across disaster response streams.
  • The framework integrates an uncertainty assessment layer that conditions claim plausibility on external proxy data to elicit model confidence, which subsequently drives a quadrant-based report generation process that communicates both situational information and associated certainty levels.
  • Evaluation across six earthquake case studies validates the approach by processing over 200,000 Twitter/X posts and utilizing USGS PAGER impact summaries as external reference data to differentiate signal reliability.

Introduction

Social media has become a critical real-time source of situational awareness during disasters, enabling emergency responders to track evolving impacts and coordinate resources efficiently. Recent automated reporting pipelines leverage large language models to classify and summarize these social media streams, but they typically assume that all extracted claims are equally plausible once filtered. This limitation is problematic because disaster information is often fragmented, contradictory, or unverified, meaning that treating every signal uniformly can amplify noise or obscure genuine but atypical reports. To resolve this gap, the authors leverage an uncertainty-aware framework that conditions an LLM on both individual social media posts and external proxy data to evaluate claim plausibility. By explicitly eliciting model confidence alongside these plausibility scores, the system structures downstream crisis reports around distinct uncertainty profiles, allowing human analysts to triage information and prioritize actionable insights under time pressure.

Dataset

  • Dataset Composition and Sources

    • The authors use short-form, user-generated posts from X (formerly Twitter) as the primary social media source.
    • The core collection is a curated earthquake corpus originally compiled by Li et al., which has been widely adopted in crisis informatics benchmarks.
    • External real-world proxy data is sourced from the USGS PAGER system and ShakeMap, providing near-real-time impact estimates and shaking intensity metrics.
  • Key Details for Each Subset

    • The social media subset contains geolocated earthquake tweets spanning multiple events with varying magnitudes, geographic contexts, and impact profiles.
    • The proxy data subset provides structured, quantitative impact indicators that align spatially with the events under study.
    • No explicit size metrics or filtering rules are detailed, but the corpus relies on prior peer-reviewed curation and direct mapping to USGS intensity estimates.
  • Usage and Processing Workflow

    • The authors feed the curated tweets and corresponding USGS proxy data into a zero-shot LLM prompting framework to evaluate claim plausibility during disaster response.
    • Rather than relying on traditional training splits or mixture ratios, the data is used for direct evaluation and uncertainty modeling, with the framework explicitly designed to remain platform-agnostic.
    • Processing involves aligning tweet-level situational reports with event-level contextual evidence, enabling the model to systematically cross-reference social media claims against independent physical impact estimates.
  • Metadata Construction and Output Formatting

    • The processed dataset is structured into a standardized JSON schema per tweet.
    • Each entry includes the original index, unique tweet ID, situational categories, a 1-to-5 likert plausibility score, a 0-to-100 confidence metric, a discrepancy assessment, and a brief reasoning alignment.
    • The pipeline also supports synthetic or simulated social media streams, allowing for evaluation under constrained data conditions.

Method

The proposed framework for uncertainty-aware crisis reporting operates as a modular pipeline designed to transform raw social media data into structured reports that explicitly convey the epistemic uncertainty of situational claims. The overall architecture consists of three sequential stages: situational awareness (SA) classification, uncertainty assessment, and report generation. As shown in the framework diagram, the process begins with data collection and the extraction of raw tweets, which are then processed by the SA classification model to assign one or more situational awareness labels. This step leverages zero-shot LLM prompting to treat classification as a multi-label task, allowing a single tweet to be associated with multiple SA categories, thereby capturing the multifaceted nature of crisis-related content. The output of this stage is a dataset of tweets enriched with SA labels.

The second stage, uncertainty assessment, constitutes the core methodological contribution. This module evaluates the plausibility and confidence of each SA-labeled tweet. Plausibility is defined as the degree to which a tweet's situational claim is consistent with independently available, event-specific contextual evidence, while confidence reflects the model's self-expressed certainty in its own plausibility judgment. To perform this assessment, the framework incorporates structured external proxy data, which provides spatially aligned impact estimates for the crisis event. For each tweet, the model receives a prompt that combines the tweet's textual content, its assigned SA label, and a location-specific summary of the proxy data. This structured prompt enables the model to reason jointly over the claim and the external context. The uncertainty assessment is implemented using a confidence elicitation framework, which decomposes the task into three key components: a prompting strategy, a sampling strategy, and an aggregation strategy. The prompting strategy instructs the model to produce a graded plausibility score on a five-point Likert scale and a confidence score on a 0–100 scale. The sampling strategy generates multiple independent responses for each tweet to capture the variability in reasoning. Finally, the aggregation strategy combines these multiple responses to produce stable estimates of plausibility and confidence. The specific techniques for sampling and aggregation, such as Top-K and consistency-based methods, are adapted from prior work on confidence elicitation in LLMs.

The output of the uncertainty assessment stage is a set of tweets, each annotated with a pair of scores representing plausibility and confidence. These scores are then fed into the final stage, the crisis reporting model, which synthesizes the uncertainty-aware situational claims into structured reports. The report generation process leverages the plausibility and confidence estimates to differentiate between information with varying degrees of certainty, allowing the system to produce summaries that not only state what is being claimed but also transparently communicate where uncertainty exists. This enables downstream users to prioritize information more effectively. The framework further operationalizes the plausibility-confidence space by partitioning it into four quadrants, which provides an interpretable structure for comparing and analyzing different types of situational signals.

Experiment

The evaluation employs a large language model pipeline across six earthquake case studies to compare uncertainty-conditioned crisis reports against a uniform baseline. The first experiment validates whether the framework introduces structured epistemic differentiation by assigning graded plausibility and confidence scores to situational claims, successfully revealing measurable dispersion across distinct uncertainty profiles. The second experiment examines whether this tweet-level variation propagates into downstream report generation, demonstrating that uncertainty conditioning meaningfully alters semantic composition and internal coherence. Ultimately, the framework functions as a diagnostic tool that translates epistemic variation into structured narrative pathways, although dominant crisis themes can occasionally constrain downstream divergence.

The authors evaluate a framework that introduces uncertainty-aware structuring into situational awareness systems, assessing whether it creates measurable variation in situational claims and influences downstream report composition. Results show that the framework generates reports with distinct semantic characteristics, including differences in internal coherence and cross-report similarity, depending on the uncertainty profile of the input data. The framework introduces structured variation in uncertainty estimates at the tweet level, leading to differentiated situational claims. Reports generated from different uncertainty profiles exhibit measurable differences in internal coherence and semantic similarity. The impact of uncertainty conditioning varies across events, with some reports showing substantial divergence from baseline summaries.

The authors evaluate a framework for uncertainty-aware situational awareness in crisis reporting by examining how uncertainty estimates at the tweet level influence downstream report composition. Results show that the framework introduces structured variation in epistemic assessments and leads to measurable differences in the semantic content and internal coherence of generated reports, though the extent of divergence varies across events. The framework introduces structured variation in uncertainty estimates at the tweet level, leading to differentiated situational claims. Reports generated from different uncertainty-conditioned tweet subsets exhibit measurable differences in semantic content and internal coherence. The impact of uncertainty on report generation varies by event, with some cases showing significant divergence and others maintaining similarity due to dominant crisis themes.

The authors evaluate a framework for uncertainty-aware situational awareness in crisis reporting, focusing on whether uncertainty estimates introduce structured variation in situational claims and whether this variation affects downstream report composition. The results show that the framework produces measurable differences in both the semantic content and internal coherence of generated reports across different uncertainty conditions. The framework introduces structured variation in uncertainty estimates at the tweet level, leading to differentiated situational claims. Reports generated under different uncertainty conditions exhibit measurable differences in semantic content and internal coherence. The impact of uncertainty on report composition varies by event, with some cases showing significant divergence and others maintaining similarity due to dominant crisis themes.

The authors evaluate an uncertainty-aware framework for situational awareness in crisis reporting by examining whether it introduces structured variation in situational claims and whether this variation affects the semantic composition of generated reports. Results show that the framework produces measurable differences in both the differentiation of uncertainty estimates at the tweet level and in the informational structure and internal coherence of downstream reports, with variation across events depending on the underlying data distribution. The framework introduces structured variation in uncertainty estimates at the tweet level, differentiating claims based on plausibility and confidence. Generated reports exhibit measurable differences in semantic content and internal coherence when conditioned on uncertainty profiles. The impact of uncertainty on report composition varies across events, with some showing distinct summaries and others maintaining similarity due to dominant thematic content.

The authors evaluate a framework for uncertainty-aware situational awareness by examining whether uncertainty estimates introduce structured variation in situational claims and whether this variation affects the composition of generated crisis reports. Results show that the framework produces measurable differences in both the entropy of uncertainty estimates and the semantic structure of reports, with some reports exhibiting distinct content and higher internal coherence compared to baseline summaries. The uncertainty-aware framework introduces measurable variation in both plausibility and confidence estimates across different events. Reports generated from uncertainty-conditioned inputs show distinct semantic structures compared to baseline summaries. Some uncertainty-conditioned reports exhibit higher internal coherence, indicating more focused information organization.

The authors evaluate an uncertainty-aware framework for crisis reporting by conditioning generated summaries on tweet-level uncertainty estimates to assess its impact on situational claims and downstream composition. The experiments validate whether integrating structured uncertainty profiles meaningfully alters the semantic content and organizational coherence of automated reports. Qualitative results demonstrate that uncertainty conditioning successfully differentiates claims and produces distinct semantic structures, though the degree of divergence from baseline summaries remains event-dependent and influenced by dominant thematic content.


KI mit KI entwickeln

Von der Idee bis zum Launch – beschleunigen Sie Ihre KI-Entwicklung mit kostenlosem KI-Co-Coding, sofort einsatzbereiter Umgebung und bestem GPU-Preis.

KI-gestütztes kollaboratives Programmieren
Sofort einsatzbereite GPUs
Die besten Preise

HyperAI Newsletters

Abonnieren Sie unsere neuesten Updates
Wir werden die neuesten Updates der Woche in Ihren Posteingang liefern um neun Uhr jeden Montagmorgen
Unterstützt von MailChimp