HyperAIHyperAI

Command Palette

Search for a command to run...

NEXUS: 시계열 예측을 위한 에이전트 프레임워크

Sarkar Snigdha Sarathi Das Palash Goyal Mihir Parmar Nanyun Peng Vishy Tirumalasetty Chun-Liang Li Rui Zhang Jinsung Yoon Tomas Pfister

초록

시계열 예측은 단순히 숫자를 외삽하는 것을 넘어, 뉴스나 이벤트와 같은 비정형 맥락 데이터를 활용한 추론을 종종 필요로 합니다. 전문적인 시계열 파운데이션 모델(TSFMs)은 숫자 패턴 기반 예측에 탁월한 능력을 보여주지만, 현실 세계의 텍스트 신호에는 무감각한 상태입니다. 반면, 대규모 언어 모델(LLMs)은 제로샷(zero-shot) 예측자로서 부상하고 있으나, 도메인 간 성능 편차가 존재하며 맥락적 고착성(contextual grounding) 측면에서도 불균형한 모습을 보이고 있습니다. 이러한 간극을 해소하기 위해 우리는 NEXUS라는 다중 에이전트 기반 예측 프레임워크를 제안합니다. NEXUS는 예측을 전문화된 단계로 분해합니다. 여기에는 거시적 및 미시적 시간적 변동성을 분리하고, 가능한 경우 맥락 정보를 통합한 후 최종 예측치를 종합하는 과정이 포함됩니다. 이러한 분해 구조 덕분에 NEXUS는 외부 통계적 기준점이나 단일 프롬프트(monolithic prompting)에 의존하지 않고도 계절적 신호부터 변동성이 크고 이벤트에 의해驱动的 정보까지 유연하게 적응할 수 있습니다. 현재 세대 LLMs가 이전보다 더 강력한 내재적 예측 능력을 갖추고 있음을 보여주었으며, 이는 수치적 추론과 맥락적 추론이 어떻게 조직되는지에 달려 있습니다. LLM의 학습 지식 커트오프 이후 시점에 해당하는 데이터(예: Zillow 부동산 지표 및 변동성이 큰 주식 시장 종목)를 엄격하게 평가한 결과, NEXUS는 최첨단 TSFM과 강력한 LLM 기반 비교군에 비해 일관되게 동등하거나 더 우수한 성능을 보였습니다. 수치적 정확도 이상으로, NEXUS는 각 예측 뒤의 근본적인 동인을 명시적으로 보여주는 고품질 추론 추적을 생성합니다. 우리의 결과는 현실 세계의 예측이 시퀀스 모델링만을 넘어선 에이전틱(agentic) 추론 문제임을 입증합니다.

One-sentence Summary

The authors introduce NEXUS, a multi-agent framework that decomposes time series forecasting into macro-level and micro-level temporal fluctuations and contextual integration to adapt to volatile event-driven information without external statistical anchors, matching or outperforming state-of-the-art TSFM and strong LLM baselines on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities while producing reasoning traces that explicitly show the fundamental drivers behind each forecast.

Key Contributions

  • The paper introduces NEXUS, a multi-agent forecasting framework that decomposes prediction into specialized stages to isolate macro and micro fluctuations before integrating contextual information. This decomposition enables adaptation from seasonal signals to volatile, event-driven information without relying on external statistical anchors or monolithic prompting.
  • Evaluated on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities, NEXUS consistently matches or outperforms state-of-the-art TSFM and strong LLM baselines. These findings indicate that current-generation LLMs possess stronger intrinsic forecasting ability than previously recognized when numerical and contextual reasoning are organized effectively.
  • Beyond numerical accuracy, the framework produces high-quality reasoning traces that explicitly show the fundamental drivers behind each forecast. This output establishes that real-world forecasting is an agentic reasoning problem extending well beyond only sequence modeling.

Introduction

Time series forecasting in high-stakes domains requires synthesizing numerical patterns with unstructured contextual signals such as news or economic events. Existing Time Series Foundation Models excel at capturing seasonal trends but operate in a multimodal vacuum that ignores qualitative drivers of structural breaks. Meanwhile, Large Language Models can parse textual context yet lack the intrinsic autoregressive mechanisms needed for precise numerical pattern recognition. The authors introduce NEXUS, a multi-agent framework that decomposes prediction into specialized stages to isolate macro-level fluctuations and micro-level event catalysts. By integrating contextual information before synthesizing a final forecast and employing a domain-level calibration loop, this approach enables LLMs to leverage stronger intrinsic forecasting abilities. The system consistently outperforms state-of-the-art foundation models on volatile stock and real estate datasets while producing interpretable reasoning traces for each prediction.

Method

The NEXUS framework addresses the limitations of existing approaches by decomposing the forecasting task into a multi-agent system that integrates multimodal context with rigorous time-series reasoning. Unlike traditional Time Series Foundation Models (TSFMs) which lack interpretability and context, or standard LLM-based forecasting which often fails to capture time-series properties accurately, NEXUS leverages a specialized architecture to produce both robust numerical predictions and explicit reasoning. Refer to the framework diagram below for a high-level comparison of these paradigms.

The authors structure the NEXUS framework into three distinct logical stages: Contextualization, Dual-Resolution Forecast Outlook Generation, and Forecast Synthesis and Calibration. This systematic breakdown allows the model to process raw multimodal data, project future outlooks across different resolutions, and finally synthesize these perspectives into a final forecast. The detailed workflow of this multi-agent system is illustrated in the figure below.

Stage 1: Contextualization To prevent cognitive overload and ensure the model tracks critical information within long sequences, the framework employs a Historical Context Agent (Actx\mathcal{A}_{ctx}Actx). This agent acts as a mapping function Actx(X1:τ,E1:τ)H1:τ\mathcal{A}_{ctx}(\mathbf{X}_{1:\tau}, \mathbf{E}_{1:\tau}) \rightarrow \mathbf{H}_{1:\tau}Actx(X1:τ,E1:τ)H1:τ that transforms raw multimodal context and basic time-series features into a highly structured chronological timeline H1:τ\mathbf{H}_{1:\tau}H1:τ. For each timestep ttt, the agent analyzes the numerical value xtx_txt alongside external textual information ete_tet to identify the most important factors driving the change. Rather than generating a generic summary, Actx\mathcal{A}_{ctx}Actx constructs a step-by-step list where each element htH1:τh_t \in \mathbf{H}_{1:\tau}htH1:τ explicitly links the value with a concise summary of key driving factors. This process filters out noise and ensures downstream agents receive a high-fidelity signal of cause and effect.

Stage 2: Dual-Resolution Forecast Outlook Generation A robust forecast requires analyzing the time series across multiple temporal resolutions to balance overarching trends with short-term volatility. NEXUS generates two complementary outlooks from the structured history H1:τ\mathbf{H}_{1:\tau}H1:τ. The Macro-Reasoning Agent (Amacro\mathcal{A}_{macro}Amacro) takes a top-down approach to map out a broad trajectory for the entire forecast horizon TTT. Formally, it acts as a mapping Amacro(H1:τ)(Xτ+1:τ+Tmacro,Rmacro)\mathcal{A}_{macro}(\mathbf{H}_{1:\tau}) \rightarrow (\mathbf{X}_{\tau+1:\tau+T}^{macro}, \mathbf{R}^{macro})Amacro(H1:τ)(Xτ+1:τ+Tmacro,Rmacro), establishing the expected regime and ensuring the forecast aligns with fundamental shifts. Conversely, the Micro-Reasoning Agent (Amicro\mathcal{A}_{\text{micro}}Amicro) adopts a granular approach, evaluating immediate catalysts and localized volatility for every single future timestep t[τ+1,τ+T]t \in [\tau + 1, \tau + T]t[τ+1,τ+T]. It outputs specific reasoning rtmicror_{t}^{\text{micro}}rtmicro and a corresponding numerical value xtmicrox_{t}^{\text{micro}}xtmicro for each step, ensuring responsiveness to immediate events.

Stage 3: Forecast Synthesis and Calibration The final stage merges the dual perspectives and refines the strategy through a calibration loop. The Forecast Synthesizer Agent (Asyn\mathcal{A}_{syn}Asyn) computes the final forecast by dynamically evaluating and merging the macro and micro perspectives. It synthesizes the structured history with the dual outlooks conditioned on learned guidelines G\mathcal{G}G, acting as a mapping Asyn(H1:τ,Xmacro,Rmacro,Xmicro,Rmicro,G)(Xτ+1:τ+T,R)\mathcal{A}_{syn}(\mathbf{H}_{1:\tau}, \mathbf{X}^{macro}, \mathbf{R}^{macro}, \mathbf{X}^{micro}, \mathbf{R}^{micro}, \mathcal{G}) \rightarrow (\mathbf{X}_{\tau+1:\tau+T}, \mathbf{R})Asyn(H1:τ,Xmacro,Rmacro,Xmicro,Rmicro,G)(Xτ+1:τ+T,R). To adapt to different domains without manual instruction design, the framework employs a Calibration Agent (Acalib\mathcal{A}_{\text{calib}}Acalib) within a forward-simulation backtesting mechanism. Historical data is divided into nnn sequential splits, where the agent analyzes prediction errors in training folds to generate critique rules Gi\mathcal{G}_iGi. These rules are intersected to produce robust master guidelines G=i=1n1Gi\mathcal{G} = \bigcap_{i=1}^{n-1} \mathcal{G}_iG=i=1n1Gi, which are validated against a hidden set to ensure they improve performance without overfitting.

Experiment

The NEXUS framework underwent rigorous zero-shot evaluation on real-world datasets curated after the models' knowledge cutoff to prevent data leakage, comparing performance against specialized time series and chain-of-thought baselines. Experimental results demonstrate that NEXUS consistently outperforms these baselines in both multimodal contextual and purely numerical forecasting while exhibiting superior logical coherence and reasoning quality. Additionally, component analysis confirms that the integration of macro, micro, and calibration agents is essential for capturing temporal dynamics and achieving optimal accuracy.

The authors perform a component analysis to quantify the impact of different agents within the NEXUS framework by comparing the full pipeline against ablated variants. The results demonstrate that disabling the Micro Reasoning, Macro Reasoning, or Calibration agents consistently leads to higher forecasting errors across both the Zillow Real Estate and Stock Market datasets. Removing the Micro Reasoning agent increases error rates, indicating its necessity for capturing granular, short-term volatility. The absence of the Macro Reasoning agent results in the highest error rates among the ablated variants, highlighting its importance for overarching trend guidance. The full NEXUS pipeline consistently achieves the lowest error metrics, confirming that synthesizing macro and micro perspectives with calibration is essential for optimal performance.

The authors compare the NEXUS framework against TimesFM-2.5 and a Chain-of-Thought baseline on Zillow Real Estate and Stock Market datasets. The results demonstrate that NEXUS consistently yields the lowest forecasting errors across short, medium, and long horizons. In contrast, the CoT Baseline frequently exhibits the highest error rates, particularly in the Zillow domain. NEXUS consistently achieves the lowest error rates across both Zillow and Stock Market datasets compared to TimesFM-2.5 and the CoT Baseline. The CoT Baseline exhibits the highest error rates, particularly in the Zillow Real Estate domain and for longer forecasting horizons. NEXUS demonstrates consistent stability, achieving the lowest average MAPE and RMSE values compared to the other evaluated methods.

The authors evaluate the NEXUS framework against TimesFM-2.5 and a Chain-of-Thought baseline across real estate and stock market datasets. The results demonstrate that NEXUS consistently achieves the lowest forecasting errors across short, medium, and long horizons, significantly outperforming the baselines. This superiority is particularly evident in the real estate dataset where the baseline model struggles with complex dynamics. NEXUS achieves the lowest error metrics across nearly all categories, highlighted by green shading in the the the table. The Chain-of-Thought baseline exhibits significantly higher errors, particularly in the Zillow Real Estate dataset where it is marked in red. The framework maintains consistent performance improvements over the baseline for both short and long-term forecasting horizons.

The authors evaluate the NEXUS framework against a Chain-of-Thought baseline for time series forecasting on Zillow Real Estate and Stock Market datasets. The results show that NEXUS consistently achieves lower error rates across all forecasting horizons compared to the baseline. The performance gap is notably larger in the Zillow Real Estate dataset, where the framework demonstrates superior accuracy in both relative and absolute error metrics. NEXUS consistently outperforms the Chain-of-Thought baseline across all forecasting horizons. The framework achieves significantly higher relative improvement in error reduction for the Zillow Real Estate dataset. Superior performance is maintained for both Mean Absolute Percentage Error and Root Mean Square Error metrics.

The authors evaluate the NEXUS framework against a Chain-of-Thought baseline for multimodal contextual time series forecasting on Zillow Real Estate and Stock Market datasets. Results indicate that NEXUS generally outperforms the baseline across short, medium, and long forecasting horizons, with the most significant gains observed in the Zillow Real Estate domain. Although the baseline shows competitive performance on long-term stock market predictions, the NEXUS framework achieves superior overall average accuracy for both datasets. NEXUS achieves lower average error rates than the CoT Baseline across both real estate and stock market datasets. The framework shows particularly strong improvements in the Zillow Real Estate domain, significantly reducing both percentage and magnitude errors. While the baseline performs competitively on long-term stock market predictions, the NEXUS framework maintains robust performance across all other tested horizons.

The evaluation assesses the NEXUS framework through component analysis and comparisons against baselines such as TimesFM-2.5 and Chain-of-Thought on Zillow Real Estate and Stock Market datasets. Ablation studies indicate that removing any agent increases forecasting errors, highlighting the necessity of integrating micro and macro reasoning with calibration for optimal performance. Consequently, the full NEXUS pipeline consistently achieves lower error rates and greater stability than competing methods across all forecasting horizons, with particularly significant gains observed in the real estate domain.


AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp