Mobile AI Agents Fail When They Must Infer User Preferences on Their Own
Chen et al. introduce KnowU-Bench, an interactive benchmark for evaluating personalized mobile agents across 192 tasks in a live Android emulation environment. Their key finding: agents that perform well on explicit instructions see performance drop below 50% when they must infer hidden user preferences or decide when to proactively assist, exposing a fundamental gap between interface competence and genuine personal assistance.