HyperAIHyperAI

Command Palette

Search for a command to run...

Adaptive LLM Routing under Budget Constraints Using Contextual Bandits and Preference-Aware Embeddings

Large Language Models (LLMs) have transformed natural language processing, yet their diverse performance profiles and variable costs present significant challenges in real-world deployment. LLM routing aims to address this by dynamically selecting the most appropriate model for each incoming query or task. Traditional methods approach routing as a supervised learning problem, relying on pre-existing knowledge of optimal query-LLM pairings. However, such assumptions do not hold in practice, where query distributions evolve over time and comprehensive ground-truth mappings are unavailable. To overcome these limitations, this work reframes LLM routing as a contextual bandit problem, enabling adaptive, data-driven decisions through online feedback without requiring exhaustive evaluation of all LLMs for every query—unlike supervised approaches. Central to our method is the construction of a shared embedding space where both queries and LLMs are represented as vectors. These embeddings are aligned to reflect their compatibility, with the initial structure learned from offline human preference data and subsequently refined using real-time bandit feedback. We introduce PILOT—Preference-prior Informed LinUCB for adaptive routing—a novel extension of the LinUCB algorithm that leverages prior knowledge from human preferences to guide exploration and improve convergence. By incorporating this prior, PILOT achieves faster learning and more accurate routing decisions, especially in early stages when data is scarce. To accommodate diverse user constraints, we further model the routing process under budget limitations as a multi-choice knapsack problem. This allows the system to balance model performance against cost, ensuring efficient resource utilization while meeting user-defined budget caps. The resulting framework enables dynamic, cost-aware routing that adapts to both changing query patterns and evolving user requirements. The approach is evaluated in realistic settings simulating dynamic workloads and varying budgets. Results demonstrate that PILOT outperforms baseline routing methods in both effectiveness and efficiency, particularly under tight budget constraints. The method effectively learns from limited feedback, adapts to new query types, and maintains high-quality routing decisions over time. This work advances the state of the art in LLM routing by integrating preference-based priors, contextual bandit learning, and budget-aware optimization into a unified, scalable framework. It offers a practical solution for deploying LLMs efficiently in production environments where cost, performance, and adaptability are critical.

Related Links