Hallucinations — instances where AI models generate statements that are unsupported or outright incorrect — have emerged as one of the central reliability challenges facing modern large language and vision-language models. Two papers published on arXiv this week address the problem from different angles, both leveraging a training technique known as Direct Preference Optimization (DPO), which teaches models to favour correct outputs over incorrect ones.
Fixing What AI Models See: P²-DPO
Researchers from a team led by Ruipeng Zhang and colleagues, including C. L. Philip Chen and Tong Zhang, introduced Perceptual Processing Direct Preference Optimization (P²-DPO), a method designed to address hallucinations that arise specifically from how large vision-language models (LVLMs) perceive and process images.
The researchers identified two key weaknesses in existing approaches. First, models often fail to accurately attend to and interpret specific regions of an image — a problem the paper terms a "perceptual bottleneck." Second, current models lack sufficient robustness when input images are degraded in quality, such as through blurring or noise.
Existing DPO-based training methods rely on preference pairs — examples of good and bad model outputs — that are often assembled without vision-specific grounding, and are typically generated off-policy, meaning they do not reflect the model's own current behaviour. P²-DPO addresses this by having the model generate its own preference pairs, using a method the authors call "Focus-and-Enhance" perception alongside a custom Calibration Loss to better align visual signals with text generation.
The team reports that P²-DPO outperformed strong baseline models on standard benchmarks while using a comparable amount of training data, and demonstrated improved performance on both attended-region accuracy and degraded-image scenarios.
Reducing Clinical Errors: HDSR and HDSR-PL
A separate team from researchers including Shamanth Kuthpadi Seethakantha, Andrew McCallum, and Wael Salloum focused on a more immediately consequential domain: clinical note summarisation. In healthcare, an AI system that fabricates or distorts medical information could directly affect patient care.
Their approach, Hallucination Detection-Guided Self-Refinement (HDSR), uses a dedicated hallucination detector to guide iterative revisions of AI-generated summaries at inference time — that is, while the model is already in use, rather than only during training. A second, more powerful variant, HDSR-PL, converts the refinement trajectories produced during this process into preference pairs that can be used to fine-tune the model itself.
Testing on real-world clinical notes from the MIMIC-IV-Note v2.2 dataset, the team found that HDSR reduced hallucinations in Llama-3.1-8B-Instruct by 24 percent, while HDSR-PL achieved a 48 percent reduction. Crucially, both methods preserved the fluency, coherence, and relevance of summaries, according to evaluations by both human experts and an LLM-based jury panel. Similar improvements were observed for Gemma-based models.
A Field in Motion
Both studies reflect a broader shift in AI research toward automated, self-correcting training pipelines that reduce reliance on costly human-labelled data. While neither paper has yet undergone peer review — both are preprints — the results contribute to a rapidly expanding body of work exploring how preference learning can be made more targeted and domain-aware.
The clinical application carries particular weight given regulatory and ethical scrutiny surrounding AI in medicine. Neither study has been evaluated in a live clinical deployment, and independent replication of results remains a standard next step in the research process.