9f57f7fce0bc345ea787405f6c5fa162_raw.mp4
[2026.06.28] Complete training implementations & data (🤗 HF Datasets), and inference & evaluation pipelines for RecursiveMAS are now available! Also checkout our updated project website!
[2026.06.25] Try it out! RecursiveMAS now has a 🧩YouTube tutorial and an 🎮interactive playground demo! Special thanks to @TwoMinutePapers and @vishalmysore!
[2026.05.24] Check out the VentureBeat article featuring our research on RecursiveMAS!
[2026.05.01] Ours paper is featured as 🤗 HuggingFace 1st Paper of the Week/Day!
[2026.04.28] All collaboration styles and model checkpoints, with examplified downstream inference are now available. Stay tuned for the complete training/inference pipeline and additional features!
[2026.04.28] We have released the RecursiveMAS paper!
RecursiveMAS is a multi-agent framework that scales agent collaboration through latent-space recursion. Rather than treating each LLM agent as an isolated module, RecursiveMAS casts the whole multi-agent system as a unified recursive computation.
Heterogeneous agents are connected by lightweight RecursiveLink modules that let them exchange, refine, and evolve latent states across recursion rounds.
Correspondingly, we design an Inner-Outer Loop training paradigm for progressive co-optimization. The inner loop provides a preliminary model-level warm start for each agent. The outer loop then trains the outer RecursiveLink across agents at the system-level.
Across 9 benchmarks spanning mathematics, science, medicine, search, and code generation, RecursiveMAS improves multi-agent coordination by recursively refining shared latent states, delivering stronger performance across sequential, mixture, distillation, and deliberation MAS systems.
✅ Release All Collaboration Patterns (Sequential, Mixture, Deliberation, Distillation).
✅ Release Demo Code for Inference (Commands Provided Below).
✅ Release Complete Inference Pipeline Across All Downstreams.
✅ Release All Training Data & Pipeline Implementation.
☑️ Add Additional Supported Model Family & MAS Collaboration Patterns.
RecursiveMAS/
├── README.md
├── requirements.txt
├── inference/ # inference pipeline and downstream tasks evaluation
│ ├── run.py
│ ├── README.md
│ ├── dataset/
│ └── inference_utils/
└── train/ # inner-outer loop training pipeline
├── train_inner.py
├── train_outer.py
├── README.md
├── data/
└── outer/
Create a clean Python environment and install all project requirements from the repository root:
conda create -n recursivemas python=3.10 -y
conda activate recursivemasInstall the required packages:
pip install -r requirements.txtFor Deliberation-style runs on the search datasets (bamboogle, hotpotqa), the Tool-Caller agent queries a real web-search API (e.g., Tavily). Please put your Search API key in a plain-text file and pass it with --tavily_keys_file:
# e.g., keys.txt
tvly-xxxxxxxxxxxxxxxxxxxxxxxxTo enable open-ended questions grading by an LLM judge (e.g., OpenAI-compatible API). Configure the LLM judge through the following environment variables:
export API_KEY=... # bearer token for the judge endpoint
export API_BASE_URL=... # OpenAI-compatible base or chat-completions URL
export API_MODEL=... # judge model idTo play around with RecursiveMAS, you can download our reference checkpoints under the RecursiveMAS Hugging Face organization.
📌 Kind Note: The released Hugging Face checkpoints are provided for quick, plug-and-play exploration and as reference systems, but NOT a single replacement for the task-specific training setups used across the paper.
The paper covers different collaboration styles and task-specific data settings; To repduce full paper results, please follow the training and inference pipeline below for complete downstream tasks evaluation.
The checkpoints are organized by MAS collaboration styles. Each collection contains (i) the individual role-specific agent, and (ii) their (inner/outer) RecursiveLink modules:
| Agent Organization | Download |
|---|---|
| Sequential-Light-Planner-Qwen3-1.7B | 🤗 HuggingFace |
| Sequential-Light-Critic-Llama3.2-1B | 🤗 HuggingFace |
| Sequential-Light-Solver-Qwen2.5-Math-1.5B | 🤗 HuggingFace |
| Sequential-Light-Outerlinks | 🤗 HuggingFace |
| Agent Organization | Download |
|---|---|
| Sequential-Scaled-Planner-Gemma3-4B | 🤗 HuggingFace |
| Sequential-Scaled-Critic-Llama3.2-3B | 🤗 HuggingFace |
| Sequential-Scaled-Solver-Qwen3.5-4B | 🤗 HuggingFace |
| Sequential-Scaled-Outerlinks | 🤗 HuggingFace |
| Agent Organization | Download |
|---|---|
| Mixture-Math-DeepSeek-R1-Distill-Qwen-1.5B | 🤗 HuggingFace |
| Mixture-Code-Qwen2.5-Coder-3B | 🤗 HuggingFace |
| Mixture-Science-BioMistral-7B | 🤗 HuggingFace |
| Mixture-Summarizer-Qwen3.5-2B | 🤗 HuggingFace |
| Mixture-Outerlinks | 🤗 HuggingFace |
| Agent Organization | Download |
|---|---|
| Distillation-Expert-Qwen3.5-9B | 🤗 HuggingFace |
| Distillation-Learner-Qwen3.5-4B | 🤗 HuggingFace |
| Distillation-Outerlinks | 🤗 HuggingFace |
| Agent Organization | Download |
|---|---|
| Deliberation-Reflector-Qwen3.5-4B | 🤗 HuggingFace |
| Deliberation-Toolcaller-Qwen3.5-4B | 🤗 HuggingFace |
| Deliberation-Outerlinks | 🤗 HuggingFace |
Here is an example of how to load the RecursiveMAS pipeline:
from system_loader import load_mas_system
mas = load_mas_system(
style="sequential_light",
device="cuda",
trust_remote_code=True,
)
planner = mas.agents["planner"].model
critic = mas.agents["critic"].model
solver = mas.agents["solver"].modelTo play around, you can run any collaboration styles by passing --style and --dataset. For example,
python inference/run.py \
--style sequential_scaled \
--dataset math500 \
--device cudaTo reproduce our experiments with task-specific configurations, please train the inner and outer RecursiveLink modules with the matching collaboration style and training data. The overall training includes two phases:
- Inner-Loop Training (
train/train_inner.py): train each agent role-specific inner RecursiveLink (frozen base model + a smallln_res_adapter). - Outer-Loop Training (
train/train_outer.py): Connect all agents together and train the outer RecursiveLink between agents through recursion.
An example of the complete training pipeline is:
# Inner-Loop Training
python train/train_inner.py \
--model_name_or_path Qwen/Qwen3-1.7B \
--mas_design sequential \
--mas_role planner \
--mas_task math \
--dataset_name RecursiveMAS/Sequential-Math \
--save_dir train/ckpts/seq_light/planner_math
# Outer-Loop Training
python train/train_outer.py \
--style sequential_light \
--agent1_model_name_or_path Qwen/Qwen3-1.7B \
--agent2_model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
--agent3_model_name_or_path Qwen/Qwen2.5-Math-1.5B-Instruct \
--agent1_inner_aligner_path train/ckpts/seq_light/planner_math \
--agent2_inner_aligner_path train/ckpts/seq_light/refiner_math \
--agent3_inner_aligner_path train/ckpts/seq_light/solver_math \
--mas_task math \
--dataset_name RecursiveMAS/Sequential-Math \
--save_dir train/ckpts/seq_light/outer_mathAdditional detailed per-style commands are provided in our training guide (train/README.md).
We store all training data through Hugging Face datasets. Below is a concise overview of each training set, along with its corresponding description.
| Dataset | Used by |
|---|---|
| 🤗 RecursiveMAS/Sequential-Math | Sequential inner & outer loop training |
| 🤗 RecursiveMAS/Sequential-Code | Sequential inner & outer loop training |
| 🤗 RecursiveMAS/Distillation-Math | Distillation inner & outer loop training |
| 🤗 RecursiveMAS/Distillation-Code | Distillation inner & outer loop training |
| 🤗 RecursiveMAS/Mixture-Math | Mixture math expert inner loop training |
| 🤗 RecursiveMAS/Mixture-Code | Mixture code expert inner loop training |
| 🤗 RecursiveMAS/Mixture-Science | Mixture science expert inner loop training |
| 🤗 RecursiveMAS/Mixture-Summarizer | Mixture summarizer inner loop training |
| 🤗 RecursiveMAS/Mixture-Outer | Mixture outer loop training |
| 🤗 RecursiveMAS/Deliberation | Deliberation inner & outer loop training |
For complete details, please kindly refer to our training data guide (train/data/README.md).
Use inference/run.py to evaluate a released reference system or a locally trained, task-specific configuration.
For example,
# Evaluate Sequential Light Style RecursiveMAS on Math500
python inference/run.py \
--style sequential_light \
--dataset math500 \
--device cuda \
--ckpt_override planner=train/ckpts/seq_light/planner_math \
--ckpt_override critic=train/ckpts/seq_light/refiner_math \
--ckpt_override solver=train/ckpts/seq_light/solver_math \
--ckpt_override outer=train/ckpts/seq_light/outer_math| Benchmark | Task | Metric |
|---|---|---|
math500 |
math reasoning | accuracy |
gpqa |
graduate-level science | accuracy |
medqa |
medical QA | accuracy |
mbppplus |
code generation | test pass rate |
aime25, aime26 |
competition math | pass@10 |
livecodebench |
code generation | pass@1 |
bamboogle, hotpotqa |
open-domain search QA | EM/LLM-as-Judge |
For complete influence and evaluation details, please kindly refer to our inference guide (inference/README.md).
To reproduce the paper’s results, train the corresponding collaboration style and data configuration, then run the provided inference pipeline using the resulting checkpoints.
In the following tables, we provide one single-run results across different RecursiveMAS collaboration styles and downstream tasks as references.
| math500 | gpqa | medqa | aime25 | aime26 | livecodebench |
|---|---|---|---|---|---|
| 88.5 | 65.7 | 82.7 | 86.7 | 90.0 | 42.1 |
| math500 | gpqa | medqa | mbppplus | aime25 | aime26 |
|---|---|---|---|---|---|
| 78.0 | 32.3 | 32.0 | 37.3 | 33.3 | 20.0 |
| gpqa | medqa | mbppplus | aime26 | livecodebench |
|---|---|---|---|---|
| 68.7 | 82.7 | 72.6 | 86.7 | 43.0 |
| gpqa | medqa | aime26 | livecodebench |
|---|---|---|---|
| 42.7 | 61.3 | 46.7 | 22.8 |
| gpqa | aime26 | bamboogle | hotpotqa |
|---|---|---|---|
| 65.3 | 90.0 | 54.4 | 43.6 |
This project is built upon the excellent open-source community, including vLLM, ARPO, and TextGrad.
We welcome discussions and contributions to RecursiveMAS! If you would like to suggest improvements, please feel free to send a pull request or contact us through email!
@misc{recursivemas,
title={Recursive Multi-Agent Systems},
author={Xiyuan Yang and Jiaru Zou and Rui Pan and Ruizhong Qiu and Pan Lu and Shizhe Diao and Jindong Jiang and Hanghang Tong and Tong Zhang and Markus J. Buehler and Jingrui He and James Zou},
year={2026},
eprint={2604.25917},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2604.25917},
}Please kindly give us a GitHub Star ⭐️ if you find our project is helpful!
Thanks a lot for your interest in our project! 😊




