Sakana AI's TreeQuest Revolutionizes AI Collaboration with Multi-Model Teams

In a groundbreaking development, Japanese AI laboratory Sakana AI has unveiled a novel technique that allows multiple large language models (LLMs) to work collaboratively on a single task. This innovative method, dubbed Multi-LLM AB-MCTS, effectively creates a “dream team” of AI agents, enabling models to perform trial-and-error and leverage their unique strengths to tackle complex problems. This approach promises to enhance AI systems’ capabilities, offering enterprises a more robust alternative to relying on a single model or provider.

The announcement comes as frontier AI models are rapidly evolving, each with distinct strengths and weaknesses due to their unique training data and architecture. Sakana AI researchers argue that these differences are not limitations but valuable resources for fostering collective intelligence. “By pooling their intelligence, AI systems can solve problems that are insurmountable for any single model,” the researchers stated in a blog post.

The Power of Collective Intelligence

Sakana AI’s approach capitalizes on the diverse aptitudes of different AI models. While one model might excel at coding, another might be better suited for creative writing. The researchers believe that, akin to humanity’s greatest achievements arising from diverse teams, AI systems can achieve more by working together. This philosophy underpins the Multi-LLM AB-MCTS method, which allows businesses to dynamically leverage the best aspects of different models, assigning the right AI to the right part of a task for superior results.

According to Takuya Akiba, a research scientist at Sakana AI and co-author of the paper, “Our framework offers a smarter, more strategic version of Best-of-N (aka repeated sampling). It complements reasoning techniques like long chain-of-thought (CoT) through reinforcement learning. By dynamically selecting the search strategy and the appropriate LLM, this approach maximizes performance within a limited number of LLM calls, delivering better results on complex tasks.”

Thinking Longer at Inference Time

The Multi-LLM AB-MCTS method is an “inference-time scaling” technique, a burgeoning area of research that focuses on improving model performance by allocating more computational resources after training. This contrasts with “training-time scaling,” which involves making models bigger and training them on larger datasets. Sakana AI’s work advances popular methods like reinforcement learning to prompt models to generate longer, more detailed CoT sequences and repeated sampling, where a model is given the same prompt multiple times to generate various potential solutions.

How Adaptive Branching Search Works

The core of this new method is an algorithm called Adaptive Branching Monte Carlo Tree Search (AB-MCTS). It enables an LLM to perform trial-and-error by intelligently balancing two search strategies: “searching deeper” and “searching wider.” Searching deeper involves refining a promising answer, while searching wider generates completely new solutions. AB-MCTS combines these approaches, allowing the system to improve a good idea or pivot to a new direction if necessary.

At each step, AB-MCTS uses probability models to decide whether to refine an existing solution or generate a new one. The researchers enhanced this with Multi-LLM AB-MCTS, which not only determines “what” to do (refine vs. generate) but also “which” LLM should do it. Initially, the system tries a balanced mix of available LLMs and learns which models are more effective, allocating more workload to them over time.

Putting the AI ‘Dream Team’ to the Test

The Multi-LLM AB-MCTS system was tested on the ARC-AGI-2 benchmark, designed to test human-like abilities to solve novel visual reasoning problems. The team used a combination of frontier models, including o4-mini, Gemini 2.5 Pro, and DeepSeek-R1. The ensemble of models achieved correct solutions for over 30% of the 120 test problems, significantly outperforming any single model.

In one instance, a solution generated by the o4-mini model was incorrect. However, the system passed this flawed attempt to DeepSeek-R1 and Gemini-2.5 Pro, which analyzed the error, corrected it, and ultimately produced the right answer. “This demonstrates that Multi-LLM AB-MCTS can flexibly combine frontier models to solve previously unsolvable problems, pushing the limits of what is achievable by using LLMs as a collective intelligence,” the researchers noted.

“In addition to the individual pros and cons of each model, the tendency to hallucinate can vary significantly among them,” Akiba said. “By creating an ensemble with a model that is less likely to hallucinate, it could be possible to achieve the best of both worlds: powerful logical capabilities and strong groundedness.”

From Research to Real-World Applications

To facilitate the application of this technique, Sakana AI has released the underlying algorithm as an open-source framework called TreeQuest, available under an Apache 2.0 license. TreeQuest provides a flexible API, allowing users to implement Multi-LLM AB-MCTS for their tasks with custom scoring and logic.

“While we are in the early stages of applying AB-MCTS to specific business-oriented problems, our research reveals significant potential in several areas,” Akiba mentioned. Beyond the ARC-AGI-2 benchmark, the team successfully applied AB-MCTS to tasks like complex algorithmic coding and improving machine learning models’ accuracy.

The release of this practical, open-source tool could pave the way for a new class of more powerful and reliable enterprise AI applications. As businesses explore the possibilities of AI collaboration, Sakana AI’s TreeQuest is set to be a pivotal player in the evolution of AI technology.