Efficient Agent Training for Computer Use

PC Agent-E completing an example in WindowsAgentArena-V2

PC Agent-E completing an example in OSWorld

Abstract

Scaling up high-quality trajectory data has long been a critical bottleneck for developing human-like computer use agents. We introduce PC Agent-E, an efficient agent training framework that significantly reduces reliance on large-scale human demonstrations. Starting with just 312 human-annotated computer use trajectories, we further improved data quality by synthesizing diverse action decisions with Claude 3.7 Sonnet. Trained on these enriched trajectories, our PC Agent-E model achieved a remarkable 141% relative improvement, surpassing the strong Claude 3.7 Sonnet with extended thinking on WindowsAgentArena-V2, an improved benchmark we also released. Furthermore, PC Agent-E demonstrates strong generalizability to different operating systems on OSWorld. Our findings suggest that strong computer use capabilities can be stimulated from a small amount of high-quality trajectory data.

Overview

Figure: Overview of our framework, consisting of 4 key components: (1) Trajectory Collection, gathering a small set of human trajectories by recording user actions and state observations at each step; (2) Thought Completion, reconstructing the implicit thought process missing in raw human trajectories; and (3) Trajectory Boost, diversifying action decisions to further enhance trajectory quality; (4) Agent Training, developing a strong computer use agent with remarkable data efficiency.

Main Results

Table: Results of successful rate (%) for different models on WindowsAgentArena-V2.

📬 Contact

If you have any questions regarding this project, feel free to directly submit a github issue.

BibTeX

@article{he2025efficientagenttraining,
      title={Efficient Agent Training for Computer Use},
      author={Yanheng He and Jiahe Jin and Pengfei Liu},
      year={2025},
      journal={arXiv preprint arXiv:2505.13909},
      url={https://arxiv.org/abs/2505.13909}
}