HyperAI
Back to Headlines

Sakana AI Unveils Multi-Model Team Technique, Boosting LLM Performance by 30%

2 days ago

On July 3, 2025, Japanese AI lab Sakana AI introduced a groundbreaking technique called Multi-LLM AB-MCTS, which allows multiple large language models (LLMs) to collaborate on a single task, enhancing their collective performance by up to 30%. This method, referred to as "inference-time scaling," contrasts with the traditional focus on "training-time scaling" by improving AI performance through strategic use of computational resources at the inference stage. Key Points of Multi-LLM AB-MCTS 1. Combining Different Strengths Each LLM has its unique strengths and weaknesses due to its distinct training data and architecture. For example, one model might excel at coding, while another might be better at creative writing. Sakana AI's researchers view these differences as assets rather than limitations, emphasizing that diverse AI teams can achieve more than any individual model. 2. Dynamic and Strategic Decision-Making The Adaptive Branching Monte Carlo Tree Search (AB-MCTS) algorithm at the heart of Multi-LLM AB-MCTS allows the system to balance between "searching deeper" and "searching wider." "Searching deeper" involves refining a promising answer, while "searching wider" generates new solutions from scratch. The algorithm uses probability models to decide the optimal strategy at each step, ensuring the best use of available resources. 3. Model Selection and Adaptation Multi-LLM AB-MCTS not only decides the best strategy for each step but also selects the most appropriate LLM. Initially, the system tries a mix of models and gradually identifies which ones are more effective for the task at hand. This adaptability allows for better problem-solving by leveraging the strengths of different models. Testing and Results 4. ARC-AGI-2 Benchmark The researchers tested their system on the ARC (Abstraction and Reasoning Corpus) benchmark, which is known for its difficulty in evaluating a human-like ability to solve novel visual reasoning problems. The ensemble of models, including o4-mini, Gemini 2.5 Pro, and DeepSeek-R1, managed to solve over 30% of the 120 test problems—outperforming any individual model. The system demonstrated its ability to dynamically assign tasks to the most effective model. 5. Solving Previously Impossible Problems One of the most impressive outcomes was the system's capability to solve problems that were previously impossible for any single model. For example, an initial error from the o4-mini model was corrected by DeepSeek-R1 and Gemini 2.5 Pro, leading to the correct solution. This illustrates the potential of Multi-LLM AB-MCTS in overcoming individual model limitations. Practical Applications 6. Open-Source Framework: TreeQuest To facilitate broader adoption, Sakana AI released the underlying algorithm as an open-source framework called TreeQuest, available under the Apache 2.0 license. TreeQuest offers a flexible API, enabling developers and businesses to integrate Multi-LLM AB-MCTS into their applications with custom scoring and logic. 7. Business-Oriented Tasks The researchers explored the application of AB-MCTS in various business contexts. Beyond visual reasoning, the system showed promise in complex algorithmic coding and improving the accuracy of machine learning models. For instance, AB-MCTS could optimize performance metrics, such as reducing the response latency of web services, by automating trial-and-error processes. Impact and Industry Perspective Takuya Akiba, a research scientist at Sakana AI, highlighted the significance of the ensemble approach in mitigating issues like hallucination—a common problem in LLMs where the model generates false information. By combining models with varying tendencies to hallucinate, businesses can achieve a balance between powerful logical capabilities and reliability. The introduction of Multi-LLM AB-MCTS and TreeQuest could revolutionize the development of enterprise AI applications, making them more powerful, versatile, and trustworthy. Industry experts agree that this technique addresses a crucial gap in the current AI landscape, where single models often fall short in handling complex, multi-faceted tasks. Sakana AI is a respected player in the AI research community, known for its innovative approaches to AI challenges. The company's focus on enhancing the performance of LLMs through collaboration aligns with the growing trend of leveraging diverse AI systems to achieve better outcomes. With the release of TreeQuest, Sakana AI has provided a valuable tool for developers and companies looking to unlock the full potential of their AI investments. This development underscores the ongoing evolution in AI research, where the emphasis is shifting towards more sophisticated and cooperative methods. It paves the way for the creation of AI systems that can handle tasks requiring intricate reasoning and problem-solving, potentially transforming industries and applications ranging from healthcare to finance.

Related Links