AI & ML

cs.AI, cs.LG, stat.ML

2 articles

All AI & ML NLP Computer Vision Multi-Agent Systems Audio & Speech

Mobile AI Agents Fail When They Must Infer User Preferences on Their Own

Chen et al. introduce KnowU-Bench, an interactive benchmark for evaluating personalized mobile agents across 192 tasks in a live Android emulation environment. Their key finding: agents that perform well on explicit instructions see performance drop below 50% when they must infer hidden user preferences or decide when to proactively assist, exposing a fundamental gap between interface competence and genuine personal assistance.

13 Apr·3 min

AI & ML

Steering Vectors in Language Models Work Mainly Through Attention's OV Circuit

Cheng, Wiegreffe, and Manocha investigate why steering vectors — a lightweight technique for controlling language model behavior — actually work at a mechanistic level. Using refusal as a case study, they find that steering vectors primarily operate through the output-value (OV) circuit of the attention mechanism while leaving the query-key (QK) circuit largely untouched, and that the vast majority of steering vector dimensions are unnecessary.

13 Apr·3 min·8 sources