HyperAIHyperAI

Command Palette

Search for a command to run...

인사이트 에이전트: 데이터 인사이트를 위한 LLM 기반 다중 에이전트 시스템

Jincheng Bai Zhenyu Zhang Jennifer Zhang Zhihuai Zhu

초록

오늘날 전자상거래 판매자들은 여러 핵심적인 과제에 직면해 있다. 특히, 가용한 프로그램과 도구를 효과적으로 탐색하고 활용하는 데 어려움을 겪하며, 다양한 도구로부터 제공되는 풍부한 데이터를 이해하고 활용하는 데도 어려움을 느끼고 있다. 이를 해결하기 위해 우리는 자동화된 정보 검색을 통해 전자상거래 판매자에게 개인화된 데이터 및 비즈니스 인사이트를 제공할 수 있는 대화형 다중 에이전트 데이터 인사이트 시스템인 ‘인사이트 에이전트(Insight Agent, IA)’를 개발하고자 한다. 본 연구의 가설은 IA가 판매자들에게 ‘강력한 성과 증폭기’ 역할을 하여, 판매자가 보다 적은 노력으로 더 빠르게 우수한 비즈니스 결정을 내릴 수 있도록 함으로써 판매자의 채택률을 지속적으로 높일 수 있을 것이라는 것이다.이 논문에서는 계획-실행(plan-and-execute) 파라다임 기반의 LLM 기반 엔드투엔드 에이전트 시스템을 소개한다. 이 시스템은 포괄적인 커버리지, 높은 정확도, 낮은 지연 시간을 목표로 설계되었으며, 계층적 다중 에이전트 구조를 채택하고 있다. 구체적으로, 매니저 에이전트와 두 개의 워커 에이전트(데이터 시각화 및 인사이트 생성)로 구성되어 있으며, 효율적인 정보 검색과 문제 해결을 지원한다. 매니저 에이전트는 경량 인코더-디코더 모델을 활용한 비도메인(OOD, Out-of-Domain) 탐지 기법과 BERT 기반 분류기를 통한 에이전트 라우팅을 결합한 간단하면서도 효과적인 머신러닝 솔루션을 설계하여 정확도와 지연 시간 모두를 최적화하였다. 두 워커 에이전트 내부에서는 API 기반 데이터 모델을 위한 전략적 계획 방식을 도입하여 사용자 질의를 세밀한 구성 요소로 분해함으로써 보다 정확한 응답을 생성하고, 도메인 지식을 동적으로 주입하여 인사이트 생성기의 성능을 강화하였다.현재 IA는 미국 내 아마존 판매자들을 대상으로 서비스를 출시하여, 인간 평가 기준으로 90%의 높은 정확도를 달성하였으며, 지연 시간(P90)은 15초 이하로 낮은 수준을 유지하고 있다.

One-sentence Summary

Amazon researchers propose Insight Agents (IA), an LLM-powered multi-agent system using plan-and-execute architecture with hierarchical agents and OOD-aware routing, enabling US Amazon sellers to rapidly obtain accurate business insights with 90% human-evaluated accuracy and sub-15s latency.

Key Contributions

  • Insight Agents (IA) is a novel LLM-backed multi-agent system built on a plan-and-execute paradigm, designed to help e-commerce sellers overcome tool discovery and data utilization barriers by delivering personalized, conversational business insights with high coverage, accuracy, and low latency.
  • The system employs a hierarchical architecture with a manager agent that routes queries via OOD detection and BERT-based classification, and two worker agents—data presenter and insight generator—that decompose queries into granular API calls and dynamically inject domain knowledge to improve response accuracy.
  • Deployed for Amazon sellers in the US, IA achieves 90% accuracy via human evaluation and maintains P90 latency under 15 seconds, demonstrating practical effectiveness in real-world e-commerce decision support.

Introduction

The authors leverage a hierarchical multi-agent system powered by LLMs to help e-commerce sellers extract actionable business insights from complex, fragmented data tools—addressing a critical need for faster, less cognitively demanding decision-making. Prior systems often struggled with accuracy, latency, or scope when handling open-ended, domain-specific queries across multiple data sources. Their main contribution is Insight Agents, a plan-and-execute architecture with a manager agent routing queries via OOD detection and BERT classification, and two worker agents that decompose queries and inject domain knowledge to boost precision—all achieving 90% accuracy and sub-15s latency in production.

Dataset

The authors use a curated dataset for training and evaluating OOD detection and agent routing models, composed as follows:

  • Dataset Composition:

    • 301 total questions: 178 in-domain, 123 out-of-domain.
    • In-domain split: 120 for data presenter, 59 for insight generator.
    • A separate benchmarking set of 100 popular questions with ground truth for end-to-end evaluation.
  • Data Augmentation:

    • Raw in-domain subsets (data presenter and insight generator) are augmented via LLM to reach 300 questions each, introducing semantic variations for balanced training.
    • No augmentation applied to out-of-domain or benchmarking sets.
  • Model Usage:

    • The augmented 300-question subsets per agent are used to finetune a lightweight BERT model (“bge-small-en-v1.5”).
    • OOD detection uses a model with hidden layer dimension 64 and hyperparameter λ = 4.
    • Final evaluation on the 100-question benchmark is performed by human auditors using three metrics: Relevance, Correctness, and Completeness.
    • Question-level accuracy is defined as the percentage of questions scoring above 0.8 on all three metrics.
  • Processing & Evaluation:

    • No cropping or metadata construction mentioned.
    • LLM used for augmentation: “anthropic.claude-3-sonnet-20240229-v1:0” via Amazon Bedrock.
    • Metrics for OOD and routing models include precision, recall, and accuracy.

Method

The Insight Agents (IA) system employs a hierarchical manager-worker multi-agent architecture designed to deliver accurate, low-latency responses to seller queries through a plan-and-execute paradigm. The overall framework, illustrated in the figure below, begins with a seller query being processed by the manager agent, which acts as the central orchestrator. This agent performs initial validation and routing before delegating the task to one of two specialized worker agents: the data presenter agent or the insight generator agent. The manager agent includes three primary components: Out-of-Domain (OOD) detection, agent routing, and query augmenter. OOD detection filters queries that fall outside the scope of available data insight, ensuring that only relevant requests proceed. Agent routing determines the appropriate resolution path based on the query type, while the query augmenter resolves ambiguities, particularly around time ranges, by enriching the query with contextual information such as the current date. After processing, the system applies response guardrails to prevent the exposure of sensitive or harmful content before returning the final answer to the seller.

The low-level architecture of the two worker agents, the data presenter and the insight generator, is detailed in the figure below. Both agents share a common data workflow planning and execution pipeline but diverge in their generation strategies. The data presenter agent focuses on retrieving and aggregating tabular data based on the query. Its data workflow planner decomposes the query into executable steps using a chain-of-thought approach, selecting appropriate APIs or functions and generating the necessary input payloads. This process is grounded in a robust data model that leverages the company's internal data APIs, ensuring high accuracy through structured retrieval. The data workflow executor then retrieves the data via the selected APIs, performs any required transformations, and applies post-processing steps such as column renaming and semantic filtering. The final response is generated through standard prompting, guided by few-shot examples to ensure the correct format.

The insight generator agent follows a similar planning and execution structure but is designed to produce analytical insights rather than raw data. Its data workflow planner also performs task decomposition and planning, but it includes an additional step of domain-aware routing to select the appropriate analytical technique—such as benchmarking, trend analysis, or seasonal analysis—based on the query's intent. The planner uses few-shot learning to guide the LLM in selecting the correct resolution path. The data workflow executor retrieves data and may invoke analytical tools for transformation. The generation process for the insight generator is more complex, utilizing customized prompting that incorporates domain-specific knowledge, prompt templates, and few-shot examples provided by domain experts. This ensures that the generated insights are not only accurate but also contextually relevant and actionable. The entire process is supported by a memory system that stores tool metadata and planner examples, enabling the LLM to effectively plan and execute tasks. The figure below illustrates the planning phase, where the LLM evaluates the query, determines if it can be answered using available tools, and decomposes the task into a sequence of steps, ultimately producing an intermediate thought process and a final output that includes the necessary API calls and calculations.

Experiment

  • AE-based OOD detection is highly efficient, processing samples in under 0.01s and achieving higher precision than LLM-based methods; recall can be improved by expanding the in-domain training set.
  • Branch routing achieves 83% accuracy with 0.3s latency per case, significantly outperforming LLM-based classification in both speed and accuracy.
  • Human evaluation of end-to-end IA responses shows 89.5% overall accuracy across 100 questions, with 57 deemed in-scope; system latency averages 13.56s at P90.

The authors use a finetuned BERT model for branch routing, achieving higher accuracy and significantly lower latency compared to an LLM-based approach. Results show that the BERT model provides a more efficient and effective solution for routing decisions in the system.

Results show that the auto-encoder-based method achieves higher precision and significantly faster running time compared to the LLM-based few-shot approach, while the LLM method demonstrates better recall. The overall performance indicates a trade-off between speed and accuracy, with the auto-encoder method being more efficient for real-time applications.

Results show that the system achieves high question-level accuracy, with 89.5% of responses correctly classified as true. The evaluation indicates strong performance in identifying in-scope questions, with a low error rate of 10.5% for false classifications.

Results show that the system achieves high average scores across relevance, correctness, and completeness, with strong consistency indicated by low standard deviations and a high number of samples. The evaluation demonstrates reliable performance in generating accurate and comprehensive responses, supported by robust metrics across key dimensions.


AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp
인사이트 에이전트: 데이터 인사이트를 위한 LLM 기반 다중 에이전트 시스템 | 문서 | HyperAI초신경