Command Palette
Search for a command to run...
NEXUS: Ein Agentenframework für die Zeitreihenvorhersage
NEXUS: Ein Agentenframework für die Zeitreihenvorhersage
Sarkar Snigdha Sarathi Das Palash Goyal Mihir Parmar Nanyun Peng Vishy Tirumalasetty Chun-Liang Li Rui Zhang Jinsung Yoon Tomas Pfister
Zusammenfassung
Die Prognose von Zeitreihen beschränkt sich nicht auf die numerische Extrapolation, sondern erfordert häufig das logische Verknüpfen von unstrukturierten kontextuellen Daten wie Nachrichten oder Ereignissen. Während spezialisierte Time Series Foundation Models (TSFMs) im Forecasting basierend auf numerischen Mustern herausragende Leistungen erbringen, sind sie gegenüber realweltlichen textualen Signalen blind. Im Gegensatz dazu entwickeln sich Large Language Models (LLMs) zunehmend zu leistungsstarken Null-Shot-Prognostikern, wobei ihre Leistungsfähigkeit jedoch je nach Domäne und kontextueller Verankerung variiert. Um diese Lücke zu schließen, stellen wir NEXUS vor, ein Multi-Agent-Framework für die Prognostik, das die Vorhersage in spezialisierte Stufen unterteilt: Die Isolierung von zeitlichen Fluktuationen auf Makro- und Mikroebene sowie die Integration kontextueller Informationen, sofern verfügbar, vor der Synthese der endgültigen Prognose. Diese Dekomposition ermöglicht es NEXUS, sich von saisonalen Signalen zu flüchtigen, ereignisgesteuerten Informationen anzupassen, ohne auf externe statistische Ankerpunkte oder monolithisches Prompting zurückgreifen zu müssen.Wir zeigen, dass LLMs der aktuellen Generation eine stärkere inhärente Prognosefähigkeit besitzen als bisher angenommen, was kritisch davon abhängt, wie numerisches und kontextuelles Reasoning strukturiert sind. In Tests auf Datensätzen, die streng auf die Knowledge-Cutoffs von LLMs folgen – insbesondere Immobilienkennzahlen von Zillow und schwankende Aktienmärkte – erfüllt NEXUS die Anforderungen des State-of-the-Art oder übertrifft starke LLM-Baselines. Neben der numerischen Genauigkeit generiert NEXUS hochwertige Reasoning-Traces, die die grundlegenden Treiber hinter jeder Prognose explizit aufzeigen. Unsere Ergebnisse belegen, dass die Prognose in der realen Welt ein agentic reasoning Problem darstellt, das weit über das reine Sequenzmodellieren hinausgeht.
One-sentence Summary
The authors introduce NEXUS, a multi-agent framework that decomposes time series forecasting into macro-level and micro-level temporal fluctuations and contextual integration to adapt to volatile event-driven information without external statistical anchors, matching or outperforming state-of-the-art TSFM and strong LLM baselines on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities while producing reasoning traces that explicitly show the fundamental drivers behind each forecast.
Key Contributions
- The paper introduces NEXUS, a multi-agent forecasting framework that decomposes prediction into specialized stages to isolate macro and micro fluctuations before integrating contextual information. This decomposition enables adaptation from seasonal signals to volatile, event-driven information without relying on external statistical anchors or monolithic prompting.
- Evaluated on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities, NEXUS consistently matches or outperforms state-of-the-art TSFM and strong LLM baselines. These findings indicate that current-generation LLMs possess stronger intrinsic forecasting ability than previously recognized when numerical and contextual reasoning are organized effectively.
- Beyond numerical accuracy, the framework produces high-quality reasoning traces that explicitly show the fundamental drivers behind each forecast. This output establishes that real-world forecasting is an agentic reasoning problem extending well beyond only sequence modeling.
Introduction
Time series forecasting in high-stakes domains requires synthesizing numerical patterns with unstructured contextual signals such as news or economic events. Existing Time Series Foundation Models excel at capturing seasonal trends but operate in a multimodal vacuum that ignores qualitative drivers of structural breaks. Meanwhile, Large Language Models can parse textual context yet lack the intrinsic autoregressive mechanisms needed for precise numerical pattern recognition. The authors introduce NEXUS, a multi-agent framework that decomposes prediction into specialized stages to isolate macro-level fluctuations and micro-level event catalysts. By integrating contextual information before synthesizing a final forecast and employing a domain-level calibration loop, this approach enables LLMs to leverage stronger intrinsic forecasting abilities. The system consistently outperforms state-of-the-art foundation models on volatile stock and real estate datasets while producing interpretable reasoning traces for each prediction.
Method
The NEXUS framework addresses the limitations of existing approaches by decomposing the forecasting task into a multi-agent system that integrates multimodal context with rigorous time-series reasoning. Unlike traditional Time Series Foundation Models (TSFMs) which lack interpretability and context, or standard LLM-based forecasting which often fails to capture time-series properties accurately, NEXUS leverages a specialized architecture to produce both robust numerical predictions and explicit reasoning. Refer to the framework diagram below for a high-level comparison of these paradigms.
The authors structure the NEXUS framework into three distinct logical stages: Contextualization, Dual-Resolution Forecast Outlook Generation, and Forecast Synthesis and Calibration. This systematic breakdown allows the model to process raw multimodal data, project future outlooks across different resolutions, and finally synthesize these perspectives into a final forecast. The detailed workflow of this multi-agent system is illustrated in the figure below.
Stage 1: Contextualization To prevent cognitive overload and ensure the model tracks critical information within long sequences, the framework employs a Historical Context Agent (Actx). This agent acts as a mapping function Actx(X1:τ,E1:τ)→H1:τ that transforms raw multimodal context and basic time-series features into a highly structured chronological timeline H1:τ. For each timestep t, the agent analyzes the numerical value xt alongside external textual information et to identify the most important factors driving the change. Rather than generating a generic summary, Actx constructs a step-by-step list where each element ht∈H1:τ explicitly links the value with a concise summary of key driving factors. This process filters out noise and ensures downstream agents receive a high-fidelity signal of cause and effect.
Stage 2: Dual-Resolution Forecast Outlook Generation A robust forecast requires analyzing the time series across multiple temporal resolutions to balance overarching trends with short-term volatility. NEXUS generates two complementary outlooks from the structured history H1:τ. The Macro-Reasoning Agent (Amacro) takes a top-down approach to map out a broad trajectory for the entire forecast horizon T. Formally, it acts as a mapping Amacro(H1:τ)→(Xτ+1:τ+Tmacro,Rmacro), establishing the expected regime and ensuring the forecast aligns with fundamental shifts. Conversely, the Micro-Reasoning Agent (Amicro) adopts a granular approach, evaluating immediate catalysts and localized volatility for every single future timestep t∈[τ+1,τ+T]. It outputs specific reasoning rtmicro and a corresponding numerical value xtmicro for each step, ensuring responsiveness to immediate events.
Stage 3: Forecast Synthesis and Calibration The final stage merges the dual perspectives and refines the strategy through a calibration loop. The Forecast Synthesizer Agent (Asyn) computes the final forecast by dynamically evaluating and merging the macro and micro perspectives. It synthesizes the structured history with the dual outlooks conditioned on learned guidelines G, acting as a mapping Asyn(H1:τ,Xmacro,Rmacro,Xmicro,Rmicro,G)→(Xτ+1:τ+T,R). To adapt to different domains without manual instruction design, the framework employs a Calibration Agent (Acalib) within a forward-simulation backtesting mechanism. Historical data is divided into n sequential splits, where the agent analyzes prediction errors in training folds to generate critique rules Gi. These rules are intersected to produce robust master guidelines G=⋂i=1n−1Gi, which are validated against a hidden set to ensure they improve performance without overfitting.
Experiment
The NEXUS framework underwent rigorous zero-shot evaluation on real-world datasets curated after the models' knowledge cutoff to prevent data leakage, comparing performance against specialized time series and chain-of-thought baselines. Experimental results demonstrate that NEXUS consistently outperforms these baselines in both multimodal contextual and purely numerical forecasting while exhibiting superior logical coherence and reasoning quality. Additionally, component analysis confirms that the integration of macro, micro, and calibration agents is essential for capturing temporal dynamics and achieving optimal accuracy.
The authors perform a component analysis to quantify the impact of different agents within the NEXUS framework by comparing the full pipeline against ablated variants. The results demonstrate that disabling the Micro Reasoning, Macro Reasoning, or Calibration agents consistently leads to higher forecasting errors across both the Zillow Real Estate and Stock Market datasets. Removing the Micro Reasoning agent increases error rates, indicating its necessity for capturing granular, short-term volatility. The absence of the Macro Reasoning agent results in the highest error rates among the ablated variants, highlighting its importance for overarching trend guidance. The full NEXUS pipeline consistently achieves the lowest error metrics, confirming that synthesizing macro and micro perspectives with calibration is essential for optimal performance.
The authors compare the NEXUS framework against TimesFM-2.5 and a Chain-of-Thought baseline on Zillow Real Estate and Stock Market datasets. The results demonstrate that NEXUS consistently yields the lowest forecasting errors across short, medium, and long horizons. In contrast, the CoT Baseline frequently exhibits the highest error rates, particularly in the Zillow domain. NEXUS consistently achieves the lowest error rates across both Zillow and Stock Market datasets compared to TimesFM-2.5 and the CoT Baseline. The CoT Baseline exhibits the highest error rates, particularly in the Zillow Real Estate domain and for longer forecasting horizons. NEXUS demonstrates consistent stability, achieving the lowest average MAPE and RMSE values compared to the other evaluated methods.
The authors evaluate the NEXUS framework against TimesFM-2.5 and a Chain-of-Thought baseline across real estate and stock market datasets. The results demonstrate that NEXUS consistently achieves the lowest forecasting errors across short, medium, and long horizons, significantly outperforming the baselines. This superiority is particularly evident in the real estate dataset where the baseline model struggles with complex dynamics. NEXUS achieves the lowest error metrics across nearly all categories, highlighted by green shading in the the the table. The Chain-of-Thought baseline exhibits significantly higher errors, particularly in the Zillow Real Estate dataset where it is marked in red. The framework maintains consistent performance improvements over the baseline for both short and long-term forecasting horizons.
The authors evaluate the NEXUS framework against a Chain-of-Thought baseline for time series forecasting on Zillow Real Estate and Stock Market datasets. The results show that NEXUS consistently achieves lower error rates across all forecasting horizons compared to the baseline. The performance gap is notably larger in the Zillow Real Estate dataset, where the framework demonstrates superior accuracy in both relative and absolute error metrics. NEXUS consistently outperforms the Chain-of-Thought baseline across all forecasting horizons. The framework achieves significantly higher relative improvement in error reduction for the Zillow Real Estate dataset. Superior performance is maintained for both Mean Absolute Percentage Error and Root Mean Square Error metrics.
The authors evaluate the NEXUS framework against a Chain-of-Thought baseline for multimodal contextual time series forecasting on Zillow Real Estate and Stock Market datasets. Results indicate that NEXUS generally outperforms the baseline across short, medium, and long forecasting horizons, with the most significant gains observed in the Zillow Real Estate domain. Although the baseline shows competitive performance on long-term stock market predictions, the NEXUS framework achieves superior overall average accuracy for both datasets. NEXUS achieves lower average error rates than the CoT Baseline across both real estate and stock market datasets. The framework shows particularly strong improvements in the Zillow Real Estate domain, significantly reducing both percentage and magnitude errors. While the baseline performs competitively on long-term stock market predictions, the NEXUS framework maintains robust performance across all other tested horizons.
The evaluation assesses the NEXUS framework through component analysis and comparisons against baselines such as TimesFM-2.5 and Chain-of-Thought on Zillow Real Estate and Stock Market datasets. Ablation studies indicate that removing any agent increases forecasting errors, highlighting the necessity of integrating micro and macro reasoning with calibration for optimal performance. Consequently, the full NEXUS pipeline consistently achieves lower error rates and greater stability than competing methods across all forecasting horizons, with particularly significant gains observed in the real estate domain.