AI Researchers Push Boundaries of Recommendation Systems with New Scaling Techniques

Three separate studies address memory, computation, and data efficiency challenges that limit large-scale personalisation engines

edit

By LineZotpaper

Published9 June 2026

Read Time3 min

Sources3 outlets

A cluster of new research papers published this week describes significant advances in the architecture of large-scale recommendation systems, addressing longstanding bottlenecks in how platforms like social media and advertising networks personalise content for billions of users — with at least one system already deployed in production at Meta.

Three papers published on arXiv in June 2026 tackle distinct but related problems in building recommendation systems that can scale to billions of users without collapsing under their own computational weight.

Rethinking Video IDs for Short-Form Platforms

Researchers from a team including Ruixiao Sun and colleagues introduced a production-deployed framework targeting the specific challenges of short-form video recommendation — the kind of system that powers feeds on platforms such as TikTok or Instagram Reels. Their core argument is that traditional systems rely on arbitrary numeric Video IDs that carry no inherent meaning about content, forcing platforms to maintain enormous lookup tables and preventing the system from understanding relationships between videos.

Their solution, called Semantic IDs, encodes content meaning directly into the identifier itself. This not only shrinks embedding tables significantly but also helps the system handle cold-start content — newly uploaded videos that have little or no engagement history — by allowing them to share semantic prefixes with established content in similar categories.

To address the second major bottleneck — the quadratic computational cost of processing long user watch histories — the team developed a "Global-Aware Compression Transformer" that condenses long sequences before applying attention. According to their paper, the approach delivers an order-of-magnitude reduction in peak memory usage and a substantial cut in computational overhead in offline testing, enabling longer user history modelling at commercially viable cost. The team reports positive results from large-scale A/B testing on live user engagement metrics.

Scaling Laws Come to Recommendation Systems

Separately, a team led by Bojian Hou introduced Kunlun, an architecture designed to bring the kind of predictable scaling laws familiar from large language model research — where doubling compute reliably improves performance by a known amount — to recommendation systems, which have historically resisted such predictability.

The researchers identified poor Model FLOPs Utilisation (MFU) as the central obstacle: inefficient components meant that adding more computing hardware produced diminishing and unpredictable returns. Kunlun introduces several low-level optimisations including a Generalised Dot-Product Attention mechanism and Hierarchical Seed Pooling, alongside higher-level features for personalisation at the event level. The paper reports that these changes increased MFU from 17 percent to 37 percent on NVIDIA B200 GPUs and doubled scaling efficiency over prior methods. Kunlun is currently deployed in major Meta Ads models.

Understanding Data Mixing at Scale

A third paper, authored by Rui Dai and Shuran Zheng, takes a more theoretical approach, proposing a framework to explain why certain mixtures of training data produce better models than others. The researchers extend scaling law theory developed for single-domain language models to multi-domain settings, identifying two governing factors: "Capacity Competition," where finite model capacity forces trade-offs between domains, and "Noise Reduction," where optimal training shifts weight toward harder-to-learn domains.

The team reports their framework outperforms existing methods in predicting effective training mixtures and, crucially, can extrapolate from small-scale experiments to accurately predict optimal data mixes for much larger models — a capability that could substantially reduce the cost of training future systems.

Together, the three papers reflect a maturing field grappling seriously with the engineering realities of deploying AI at internet scale, where theoretical elegance must coexist with strict latency budgets and resource constraints.

Analysis

Why This Matters

Recommendation systems directly shape what billions of people read, watch, and buy every day; improvements in their efficiency and accuracy have outsized societal and commercial consequences.
The deployment of Kunlun inside Meta's advertising infrastructure signals that these research advances are not merely academic — they translate into real changes in how major platforms serve ads and content to users.
Breakthroughs in scaling laws for recommendations could accelerate an arms race among large platforms, widening the gap between well-resourced incumbents and smaller competitors.

Background

Recommendation systems have been a cornerstone of internet platforms since early collaborative filtering algorithms appeared in the 1990s. Over the past decade, deep learning transformed the field, with companies like Google, Meta, Netflix, and ByteDance investing heavily in increasingly complex neural architectures.

The emergence of scaling laws for large language models — most notably the Kaplan (2020) and Chinchilla (2022) papers — gave language AI researchers a reliable map: spend more compute predictably, get better results predictably. Recommendation systems, which must simultaneously handle sparse user behaviour data, massive item catalogues, and strict real-time latency requirements, have historically resisted this kind of predictability, making resource allocation decisions difficult and wasteful.

The short-form video explosion, driven by TikTok's global growth after 2018 and the subsequent response from Instagram Reels and YouTube Shorts, dramatically intensified the challenge. These platforms require systems that can process watching histories of thousands of items per user while responding in milliseconds, at a scale of hundreds of millions to billions of daily active users.

Key Perspectives

Platform Engineers and Industry Researchers: The production deployments described in two of the three papers — one at an unnamed billion-user platform and one explicitly at Meta — suggest that these techniques address real pain points. Efficiency gains that allow longer user history modelling translate directly into more personalised and engaging feeds, which drives advertising revenue and user retention.

Academic AI Community: The theoretical contribution from Dai and Zheng is notable for attempting to move the field beyond purely empirical trial-and-error. By explaining why certain data mixtures work, their framework could reduce the enormous computational cost of hyperparameter search for future large-scale training runs.

Critics and Skeptics: Advances that make recommendation systems more powerful and efficient also make them more effective at maximising engagement metrics, which critics argue can amplify addictive behaviour, filter bubbles, and the spread of sensational content. None of the three papers addresses these downstream harms, focusing instead on technical performance benchmarks such as engagement rates and computational cost.

What to Watch

Whether Meta publishes further performance data on Kunlun's impact on advertising revenue and user engagement, which would quantify the commercial stakes of recommendation scaling.
Academic and regulatory attention to whether more powerful recommendation systems trigger new scrutiny under the EU's Digital Services Act or similar frameworks requiring platforms to offer non-personalised alternatives.
Adoption of Semantic ID approaches by other major platforms as an alternative to traditional item-ID systems, particularly for cold-start content challenges faced by newer or smaller video platforms.

Sources

Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design — cs.AI updates on arXiv.org
Explaining Data Mixing Scaling Laws — cs.AI updates on arXiv.org
Beyond Item IDs: Scaling Short-Form-Video Recommendation via Semantic-Native Long Sequence Modeling — cs.AI updates on arXiv.org