Two independent research teams have published advances in 'in-context learning' (ICL), the technique that allows large language models to adapt to unfamiliar tasks by being shown a handful of carefully chosen examples at inference time. One team introduces a mathematically principled method for selecting the most useful examples, while another proposes a system that dynamically adjusts how many examples a model receives — and does so far more efficiently.
The Problem With Giving AI the Wrong Examples
When engineers deploy a large language model (LLM) on a new task — say, classifying customer complaints or translating legal text — they often lack the time or data to retrain the model. Instead, they rely on in-context learning: inserting a small number of worked examples directly into the model's prompt as guidance. The approach is powerful, but a persistent question has gone largely unsolved: which examples should be chosen, and how many?
Two papers published in mid-2026 take different but complementary approaches to answering that question.
Choosing Better Examples: KITE
Researchers from a multi-institution collaboration introduced KITE (Kernelized and Information Theoretic Exemplars), a framework that treats example selection as a formal optimization problem. Rather than relying on nearest-neighbour retrieval — which simply picks examples most similar to the user's query — KITE models the language model as a linear function over its internal embeddings and seeks the subset of examples that minimises prediction error for the specific query at hand.
The team derived a mathematical objective that is "approximately submodular" — a property that allows a simple greedy algorithm to find good solutions with a provable quality guarantee. They also incorporated two enhancements: a "kernel trick" that handles high-dimensional data without explicit computation, and a diversity-promoting regulariser to prevent the selected examples from being redundant near-duplicates of one another.
In experiments across multiple text classification benchmarks, KITE outperformed standard retrieval baselines, with the researchers arguing the gains are especially pronounced in real-world, label-scarce settings where poor example selection has the most damaging effect.
Adapting on the Fly: AdapShot
A separate team addressed a related but distinct limitation: most ICL systems use a fixed number of examples regardless of the difficulty of each individual query. A simple query handed the same 20 examples as a highly complex one is either wasting computational resources or drowning in irrelevant context.
AdapShot, proposed by researchers affiliated with institutions including Sichuan University, introduces a probe-based mechanism that measures the model's output uncertainty — specifically its entropy — to decide dynamically how many examples a given query actually needs. If the model is already confident after seeing a few shots, it stops there. If it remains uncertain, it loads more.
Critically, AdapShot addresses the steep computational cost of processing long contexts by reusing previously computed "key-value cache" representations — the internal memory structures LLMs use during inference. The system introduces a method for reordering these cached representations without breaking the model's positional encoding, a technical obstacle that has limited such approaches before. In benchmark testing, AdapShot achieved roughly a 10% average performance improvement over prior state-of-the-art methods while running approximately 4.6 times faster.
Different Angles on a Shared Problem
The two papers approach ICL improvement from different directions — KITE focuses on which examples to select, AdapShot on how many to use and how efficiently to process them — and are therefore more complementary than competing. Whether combining both approaches yields further gains is a question neither paper directly addresses.
Analysis
Why This Matters
- In-context learning is one of the primary ways organisations deploy AI models on specialised tasks without expensive fine-tuning; better ICL directly improves AI reliability and cost-efficiency across industries.
- Computational costs from long contexts are a significant bottleneck in scaling AI services; efficiency gains like AdapShot's 4.6x speedup could reduce inference costs meaningfully at scale.
- Both methods target label-scarce, real-world scenarios — the conditions under which most practical AI deployments actually operate, making the research directly applicable beyond academic benchmarks.
Background
In-context learning became prominent with GPT-3 in 2020, when OpenAI demonstrated that large language models could perform new tasks simply by reading a few examples in their prompt — no weight updates required. The technique democratised AI deployment, allowing teams without machine learning expertise to adapt general-purpose models.
However, early ICL was largely ad hoc, relying on manually curated examples or random sampling. The KATE (Knn-Augmented in-conText Examples) method, introduced circa 2021-2022, popularised the idea of using semantic similarity — nearest-neighbour search over embeddings — to automatically retrieve relevant examples. While influential, KATE and its derivatives inherit known weaknesses of nearest-neighbour methods in high dimensions, including a tendency to cluster on similar examples and ignore diversity.
The emergence of models with context windows large enough to accommodate dozens or hundreds of examples ("many-shot" ICL) opened new possibilities but also new problems: longer contexts are exponentially more expensive to process, and flooding a model with marginally relevant examples can hurt rather than help performance. The two papers published this week represent the latest wave of research trying to place ICL on more principled theoretical and engineering foundations.
Key Perspectives
Proponents of principled selection (KITE approach): Argue that theoretical grounding — information theory, submodularity, kernel methods — is essential for ICL to move beyond heuristics. Principled methods offer guarantees and generalise more reliably across tasks, particularly when labelled data is scarce.
Efficiency-focused researchers (AdapShot approach): Contend that the practical bottleneck is computational cost, not just selection quality. A theoretically optimal set of 50 examples may be impractical if processing them takes seconds per query; adaptive, cache-aware methods make many-shot ICL viable in production.
Critics and sceptics: Some researchers question whether ICL improvements measured on classification benchmarks transfer to more complex generative tasks like reasoning or code generation. Others note that both methods add system complexity — KITE requires kernel computations, AdapShot requires entropy probing and cache management — which may be difficult to integrate into existing inference pipelines. There is also a broader debate about whether ICL is a long-term paradigm or a transitional technique as fine-tuning becomes cheaper.
What to Watch
- Whether either method demonstrates consistent gains on generative and reasoning benchmarks beyond classification tasks, which would significantly broaden their applicability.
- Adoption by major inference frameworks (such as vLLM or SGLang) of KV cache reuse strategies similar to AdapShot's, which would signal industry validation of the efficiency approach.
- Potential integration of both selection quality and shot count optimisation into a unified framework — a natural next step that neither paper currently addresses.