NLP

cs.CL — natural language processing

3 articles

All AI & ML NLP Computer Vision Multi-Agent Systems Audio & Speech

LLMs Simulating Human Behavior Converge Toward an Unrealistically Positive "Average Person"

Chen et al. introduce OmniBehavior, the first user simulation benchmark constructed entirely from real-world data spanning multiple scenarios and long time horizons. Their evaluation of state-of-the-art LLMs reveals a fundamental structural bias: models consistently simulate an overly active, homogenized, and optimistic version of human behavior, losing the diversity and idiosyncrasies present in authentic behavioral data.

12 Apr·3 min

NLP

Language Models Learn Skills in a Predictable, Compositional Order During Training

Liu et al. propose and test the "Implicit Curriculum Hypothesis" — that pretraining follows a predictable, compositional curriculum rather than acquiring skills in an arbitrary order. By tracking when specific capabilities emerge across four model families (410M–13B parameters), they find highly consistent orderings (Spearman ρ = .81 across 45 model pairs) and show that composite tasks reliably emerge after their constituent subtasks.

12 Apr·3 min

NLP

AI agents struggle with everyday online tasks on live websites

Zhang et al. created ClawBench, a benchmark testing AI agents on 153 everyday online tasks like booking appointments and submitting applications across 144 live websites. Even the best model, Claude Sonnet 4.6, completed only 33.3% of tasks, revealing significant gaps in current AI capabilities for real-world web automation.

12 Apr·2 min