Back to all posts

Sakana AI Unlocks Collaborative AI Dream Teams

2025-07-05Ben Dickson4 minutes read
AI Collaboration
LLM
Inference Scaling

An abstract image representing AI collaboration

Japanese AI lab Sakana AI has unveiled a groundbreaking technique that enables multiple large language models (LLMs) to function as a cohesive team. This method, detailed in a paper on Multi-LLM AB-MCTS, allows different models to collaborate, perform trial-and-error, and leverage their individual strengths to tackle problems too complex for a single AI.

For businesses, this represents a significant shift. Instead of relying on a single AI provider, companies can now orchestrate a team of specialized models, assigning the right AI to the right part of a task to achieve superior performance and more robust solutions.

The Power of Collective Intelligence

Frontier AI models are advancing at an incredible pace, but each one possesses unique strengths and weaknesses based on its architecture and training data. One model might be a coding prodigy, while another excels at creative prose. Sakana AI's researchers see this diversity not as a flaw, but as a powerful feature.

“We see these biases and varied aptitudes not as limitations, but as precious resources for creating collective intelligence,” the researchers explained in their blog post. They draw a parallel to human innovation, where diverse teams consistently produce the greatest achievements. “By pooling their intelligence, AI systems can solve problems that are insurmountable for any single model.”

Thinking Smarter at Inference Time

Sakana AI’s algorithm is a form of “inference-time scaling,” a research area focused on boosting a model's performance by allocating more computation after it has been trained. This contrasts with the more common “training-time scaling,” which involves building larger models and using more data.

Other inference-time techniques include prompting models for longer chain-of-thought reasoning or using repeated sampling to brainstorm multiple solutions. Sakana AI’s method elevates these concepts.

“Our framework offers a smarter, more strategic version of Best-of-N (aka repeated sampling),” Takuya Akiba, a research scientist at Sakana AI, told VentureBeat. “By dynamically selecting the search strategy and the appropriate LLM, this approach maximizes performance within a limited number of LLM calls, delivering better results on complex tasks.”

How Adaptive Branching Search Works

The technology is powered by an algorithm called Adaptive Branching Monte Carlo Tree Search (AB-MCTS). This method allows an LLM to conduct an intelligent trial-and-error process by balancing two strategies: “searching deeper” to refine a promising solution, and “searching wider” to generate entirely new ideas. It uses Monte Carlo Tree Search (MCTS), the same decision-making algorithm used by DeepMind’s legendary AlphaGo.

A diagram showing different test-time scaling strategies Different test-time scaling strategies. Source: Sakana AI

The Multi-LLM version takes this a step further. It not only decides what to do next (refine or generate) but also which LLM is best for the job. The system learns on the fly, initially testing a mix of models and then allocating more work to those that prove most effective for the specific task.

Putting the AI ‘Dream Team’ to the Test

The researchers benchmarked their Multi-LLM AB-MCTS system on the notoriously difficult ARC-AGI-2 benchmark, which tests human-like visual reasoning. They assembled a team of frontier models, including o4-mini, Gemini 2.5 Pro, and DeepSeek-R1.

The results were striking. The AI collective solved over 30% of the test problems, a score that far surpassed what any of the models could achieve individually.

A chart comparing the performance of AB-MCTS against individual models AB-MCTS vs. individual models. Source: Sakana AI

More impressively, the team observed models collaborating to solve problems that were previously impossible for any single one. In one instance, o4-mini produced a flawed solution. The system then passed this incorrect attempt to DeepSeek-R1 and Gemini-2.5 Pro, which successfully identified the error, corrected it, and arrived at the right answer.

A diagram showing how different models are selected at different stages of problem-solving AB-MCTS can select different models at different stages of solving a problem. Source: Sakana AI

This collaborative approach also has major implications for mitigating AI hallucinations. “By creating an ensemble with a model that is less likely to hallucinate, it could be possible to achieve the best of both worlds: powerful logical capabilities and strong groundedness,” Akiba noted.

From Research to Real-World Applications

To empower developers to use this technique, Sakana AI has released the core algorithm as an open-source framework called TreeQuest. Available under a commercial-friendly Apache 2.0 license, TreeQuest provides a flexible API for implementing Multi-LLM AB-MCTS on custom tasks.

Akiba mentioned that beyond the benchmark, the team successfully applied the method to complex algorithmic coding and improving machine learning model accuracy. “AB-MCTS could also be highly effective for problems that require iterative trial-and-error, such as optimizing performance metrics of existing software,” he said. The release of this practical tool could usher in a new era of powerful and reliable enterprise AI applications built on collective intelligence.

Read Original Post
ImaginePro newsletter

Subscribe to our newsletter!

Subscribe to our newsletter to get the latest news and designs.