LLMs Simulating Human Behavior Converge Toward an Unrealistically Positive "Average Person"
Chen et al. introduce OmniBehavior, the first user simulation benchmark constructed entirely from real-world data spanning multiple scenarios and long time horizons. Their evaluation of state-of-the-art LLMs reveals a fundamental structural bias: models consistently simulate an overly active, homogenized, and optimistic version of human behavior, losing the diversity and idiosyncrasies present in authentic behavioral data.