Command Palette
Search for a command to run...
NEXUS: 時系列予測のためのエージェント型フレームワーク
NEXUS: 時系列予測のためのエージェント型フレームワーク
Sarkar Snigdha Sarathi Das Palash Goyal Mihir Parmar Nanyun Peng Vishy Tirumalasetty Chun-Liang Li Rui Zhang Jinsung Yoon Tomas Pfister
概要
時系列予測は単なる数値の外挿にとどまらず、ニュースや出来事といった非構造化の文脈データを用いた推論を必要とすることが多い。専門的な時系列ファウンデーションモデル(TSFM)は数値パターンに基づく予測で卓越した性能を示すが、現実世界のテキスト信号については無認識である。一方、大規模言語モデル(LLM)はゼロショット予測者として注目されつつあるが、その性能はドメインや文脈のグラウンディングにおいてばらつきが見られる。このギャップを埋めるために、私たちはNEXUSを提案する。NEXUSは予測を専門的な段階に分解するマルチエージェントの予測フレームワークである。具体的には、マクロレベルおよびマイクロレベルの時系列変動を分離し、利用可能な場合は文脈情報を統合してから最終的な予測を合成する。この分解により、NEXUSは外部の統計的なアンカーや単一のプロンプトに依存することなく、季節的な信号から揮発性の高い出来事駆動型の情報まで適応的に予測を行うことができる。我々の示唆するところによると、現在の大規模言語モデル(LLM)は、数値的推論と文脈的推論をいかに組織化するかに依存して、以前に認識されていたよりも強い本質的な予測能力を有している。Zillowの不動産指標や揮発性の高い株式市場の銘柄など、LLMの知識切り捨て期後に作成されたデータセットで評価を行った結果、NEXUSは最先端のTSFMや強力なLLMベースラインと匹敵するか、それを超える性能を一貫して発揮した。数値的な精度だけでなく、NEXUSは各予測の背後にある根本的な駆動要因を明示する高品質な推論_trace_(推論経路)を生成する。我々の結果は、現実世界の予測が単なる系列モデリングを超え、エージェント型の推論問題であることを明らかにした。
One-sentence Summary
The authors introduce NEXUS, a multi-agent framework that decomposes time series forecasting into macro-level and micro-level temporal fluctuations and contextual integration to adapt to volatile event-driven information without external statistical anchors, matching or outperforming state-of-the-art TSFM and strong LLM baselines on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities while producing reasoning traces that explicitly show the fundamental drivers behind each forecast.
Key Contributions
- The paper introduces NEXUS, a multi-agent forecasting framework that decomposes prediction into specialized stages to isolate macro and micro fluctuations before integrating contextual information. This decomposition enables adaptation from seasonal signals to volatile, event-driven information without relying on external statistical anchors or monolithic prompting.
- Evaluated on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities, NEXUS consistently matches or outperforms state-of-the-art TSFM and strong LLM baselines. These findings indicate that current-generation LLMs possess stronger intrinsic forecasting ability than previously recognized when numerical and contextual reasoning are organized effectively.
- Beyond numerical accuracy, the framework produces high-quality reasoning traces that explicitly show the fundamental drivers behind each forecast. This output establishes that real-world forecasting is an agentic reasoning problem extending well beyond only sequence modeling.
Introduction
Time series forecasting in high-stakes domains requires synthesizing numerical patterns with unstructured contextual signals such as news or economic events. Existing Time Series Foundation Models excel at capturing seasonal trends but operate in a multimodal vacuum that ignores qualitative drivers of structural breaks. Meanwhile, Large Language Models can parse textual context yet lack the intrinsic autoregressive mechanisms needed for precise numerical pattern recognition. The authors introduce NEXUS, a multi-agent framework that decomposes prediction into specialized stages to isolate macro-level fluctuations and micro-level event catalysts. By integrating contextual information before synthesizing a final forecast and employing a domain-level calibration loop, this approach enables LLMs to leverage stronger intrinsic forecasting abilities. The system consistently outperforms state-of-the-art foundation models on volatile stock and real estate datasets while producing interpretable reasoning traces for each prediction.
Method
The NEXUS framework addresses the limitations of existing approaches by decomposing the forecasting task into a multi-agent system that integrates multimodal context with rigorous time-series reasoning. Unlike traditional Time Series Foundation Models (TSFMs) which lack interpretability and context, or standard LLM-based forecasting which often fails to capture time-series properties accurately, NEXUS leverages a specialized architecture to produce both robust numerical predictions and explicit reasoning. Refer to the framework diagram below for a high-level comparison of these paradigms.
The authors structure the NEXUS framework into three distinct logical stages: Contextualization, Dual-Resolution Forecast Outlook Generation, and Forecast Synthesis and Calibration. This systematic breakdown allows the model to process raw multimodal data, project future outlooks across different resolutions, and finally synthesize these perspectives into a final forecast. The detailed workflow of this multi-agent system is illustrated in the figure below.
Stage 1: Contextualization To prevent cognitive overload and ensure the model tracks critical information within long sequences, the framework employs a Historical Context Agent (Actx). This agent acts as a mapping function Actx(X1:τ,E1:τ)→H1:τ that transforms raw multimodal context and basic time-series features into a highly structured chronological timeline H1:τ. For each timestep t, the agent analyzes the numerical value xt alongside external textual information et to identify the most important factors driving the change. Rather than generating a generic summary, Actx constructs a step-by-step list where each element ht∈H1:τ explicitly links the value with a concise summary of key driving factors. This process filters out noise and ensures downstream agents receive a high-fidelity signal of cause and effect.
Stage 2: Dual-Resolution Forecast Outlook Generation A robust forecast requires analyzing the time series across multiple temporal resolutions to balance overarching trends with short-term volatility. NEXUS generates two complementary outlooks from the structured history H1:τ. The Macro-Reasoning Agent (Amacro) takes a top-down approach to map out a broad trajectory for the entire forecast horizon T. Formally, it acts as a mapping Amacro(H1:τ)→(Xτ+1:τ+Tmacro,Rmacro), establishing the expected regime and ensuring the forecast aligns with fundamental shifts. Conversely, the Micro-Reasoning Agent (Amicro) adopts a granular approach, evaluating immediate catalysts and localized volatility for every single future timestep t∈[τ+1,τ+T]. It outputs specific reasoning rtmicro and a corresponding numerical value xtmicro for each step, ensuring responsiveness to immediate events.
Stage 3: Forecast Synthesis and Calibration The final stage merges the dual perspectives and refines the strategy through a calibration loop. The Forecast Synthesizer Agent (Asyn) computes the final forecast by dynamically evaluating and merging the macro and micro perspectives. It synthesizes the structured history with the dual outlooks conditioned on learned guidelines G, acting as a mapping Asyn(H1:τ,Xmacro,Rmacro,Xmicro,Rmicro,G)→(Xτ+1:τ+T,R). To adapt to different domains without manual instruction design, the framework employs a Calibration Agent (Acalib) within a forward-simulation backtesting mechanism. Historical data is divided into n sequential splits, where the agent analyzes prediction errors in training folds to generate critique rules Gi. These rules are intersected to produce robust master guidelines G=⋂i=1n−1Gi, which are validated against a hidden set to ensure they improve performance without overfitting.
Experiment
The NEXUS framework underwent rigorous zero-shot evaluation on real-world datasets curated after the models' knowledge cutoff to prevent data leakage, comparing performance against specialized time series and chain-of-thought baselines. Experimental results demonstrate that NEXUS consistently outperforms these baselines in both multimodal contextual and purely numerical forecasting while exhibiting superior logical coherence and reasoning quality. Additionally, component analysis confirms that the integration of macro, micro, and calibration agents is essential for capturing temporal dynamics and achieving optimal accuracy.
The authors perform a component analysis to quantify the impact of different agents within the NEXUS framework by comparing the full pipeline against ablated variants. The results demonstrate that disabling the Micro Reasoning, Macro Reasoning, or Calibration agents consistently leads to higher forecasting errors across both the Zillow Real Estate and Stock Market datasets. Removing the Micro Reasoning agent increases error rates, indicating its necessity for capturing granular, short-term volatility. The absence of the Macro Reasoning agent results in the highest error rates among the ablated variants, highlighting its importance for overarching trend guidance. The full NEXUS pipeline consistently achieves the lowest error metrics, confirming that synthesizing macro and micro perspectives with calibration is essential for optimal performance.
The authors compare the NEXUS framework against TimesFM-2.5 and a Chain-of-Thought baseline on Zillow Real Estate and Stock Market datasets. The results demonstrate that NEXUS consistently yields the lowest forecasting errors across short, medium, and long horizons. In contrast, the CoT Baseline frequently exhibits the highest error rates, particularly in the Zillow domain. NEXUS consistently achieves the lowest error rates across both Zillow and Stock Market datasets compared to TimesFM-2.5 and the CoT Baseline. The CoT Baseline exhibits the highest error rates, particularly in the Zillow Real Estate domain and for longer forecasting horizons. NEXUS demonstrates consistent stability, achieving the lowest average MAPE and RMSE values compared to the other evaluated methods.
The authors evaluate the NEXUS framework against TimesFM-2.5 and a Chain-of-Thought baseline across real estate and stock market datasets. The results demonstrate that NEXUS consistently achieves the lowest forecasting errors across short, medium, and long horizons, significantly outperforming the baselines. This superiority is particularly evident in the real estate dataset where the baseline model struggles with complex dynamics. NEXUS achieves the lowest error metrics across nearly all categories, highlighted by green shading in the the the table. The Chain-of-Thought baseline exhibits significantly higher errors, particularly in the Zillow Real Estate dataset where it is marked in red. The framework maintains consistent performance improvements over the baseline for both short and long-term forecasting horizons.
The authors evaluate the NEXUS framework against a Chain-of-Thought baseline for time series forecasting on Zillow Real Estate and Stock Market datasets. The results show that NEXUS consistently achieves lower error rates across all forecasting horizons compared to the baseline. The performance gap is notably larger in the Zillow Real Estate dataset, where the framework demonstrates superior accuracy in both relative and absolute error metrics. NEXUS consistently outperforms the Chain-of-Thought baseline across all forecasting horizons. The framework achieves significantly higher relative improvement in error reduction for the Zillow Real Estate dataset. Superior performance is maintained for both Mean Absolute Percentage Error and Root Mean Square Error metrics.
The authors evaluate the NEXUS framework against a Chain-of-Thought baseline for multimodal contextual time series forecasting on Zillow Real Estate and Stock Market datasets. Results indicate that NEXUS generally outperforms the baseline across short, medium, and long forecasting horizons, with the most significant gains observed in the Zillow Real Estate domain. Although the baseline shows competitive performance on long-term stock market predictions, the NEXUS framework achieves superior overall average accuracy for both datasets. NEXUS achieves lower average error rates than the CoT Baseline across both real estate and stock market datasets. The framework shows particularly strong improvements in the Zillow Real Estate domain, significantly reducing both percentage and magnitude errors. While the baseline performs competitively on long-term stock market predictions, the NEXUS framework maintains robust performance across all other tested horizons.
The evaluation assesses the NEXUS framework through component analysis and comparisons against baselines such as TimesFM-2.5 and Chain-of-Thought on Zillow Real Estate and Stock Market datasets. Ablation studies indicate that removing any agent increases forecasting errors, highlighting the necessity of integrating micro and macro reasoning with calibration for optimal performance. Consequently, the full NEXUS pipeline consistently achieves lower error rates and greater stability than competing methods across all forecasting horizons, with particularly significant gains observed in the real estate domain.