ICLR2026

K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control

Zhe Wu, Donglin Mo, Hongjin Lu, Junliang Xing, Jianheng Liu, Yuheng Jing, Kai Li, Kun Shao, Jianye HAO, Yuanchun Shi

1 citation

Abstract

Existing mobile device control agents often perform poorly when solving complex tasks requiring long-horizon planning and precise operations, typically due to a lack of relevant task experience or unfamiliarity with skill execution. We propose K²-Agent\textbf{K²-Agent}, a hierarchical framework that models human-like cognition by separating and co-evolving declarative ("knowing what") and procedural ("knowing how") knowledge for planning and execution. K²-Agent’s high level reasoner is bootstrapped from a single demonstration per task and runs a Summarize–Reflect–Locate–Revise (SRLR) loop to distill and iteratively refine task-level declarative knowledge through self-evolution. The low-level executor is trained with our curriculum-guided Group Relative Policy Optimization (C-GRPO), which (i) constructs a balanced sample pool using decoupled reward signals and (ii) employs dynamic demonstration injection to guide the model in autonomously generating successful trajectories for training. On the challenging AndroidWorld benchmark, K2^2-Agent achieves a new state of the art\textbf{state of the art} with 76.1% success rate\textbf{76.1\% success rate}, ranking 1st\textbf{1st} among all methods using only raw screenshots and open-source backbones\textbf{using only raw screenshots and open-source backbones}. Furthermore, K²-Agent shows powerful dual generalization: its high-level declarative knowledge transfers across diverse base models, while its low-level procedural skills achieve competitive performance on unseen tasks in ScreenSpot-v2 and Android-in-the-Wild (AitW).