Researchers Target AI Hallucinations With New Training Techniques for Vision and Medical Models

Two independent studies propose preference-learning approaches to make AI systems more factually reliable

edit

By LineZotpaper

Published3 June 2026

Read Time3 min

Sources10 outlets

Two research teams have published separate studies this week proposing novel training methods to reduce hallucinations in AI models — one targeting visual misperception in large vision-language models, the other tackling factual errors in AI-generated clinical summaries — reflecting growing urgency in the field to make AI systems more reliable before they are deployed in high-stakes settings.

Hallucinations — instances where AI models generate statements that are unsupported or outright incorrect — have emerged as one of the central reliability challenges facing modern large language and vision-language models. Two papers published on arXiv this week address the problem from different angles, both leveraging a training technique known as Direct Preference Optimization (DPO), which teaches models to favour correct outputs over incorrect ones.

Fixing What AI Models See: P²-DPO

Researchers from a team led by Ruipeng Zhang and colleagues, including C. L. Philip Chen and Tong Zhang, introduced Perceptual Processing Direct Preference Optimization (P²-DPO), a method designed to address hallucinations that arise specifically from how large vision-language models (LVLMs) perceive and process images.

The researchers identified two key weaknesses in existing approaches. First, models often fail to accurately attend to and interpret specific regions of an image — a problem the paper terms a "perceptual bottleneck." Second, current models lack sufficient robustness when input images are degraded in quality, such as through blurring or noise.

Existing DPO-based training methods rely on preference pairs — examples of good and bad model outputs — that are often assembled without vision-specific grounding, and are typically generated off-policy, meaning they do not reflect the model's own current behaviour. P²-DPO addresses this by having the model generate its own preference pairs, using a method the authors call "Focus-and-Enhance" perception alongside a custom Calibration Loss to better align visual signals with text generation.

The team reports that P²-DPO outperformed strong baseline models on standard benchmarks while using a comparable amount of training data, and demonstrated improved performance on both attended-region accuracy and degraded-image scenarios.

Reducing Clinical Errors: HDSR and HDSR-PL

A separate team from researchers including Shamanth Kuthpadi Seethakantha, Andrew McCallum, and Wael Salloum focused on a more immediately consequential domain: clinical note summarisation. In healthcare, an AI system that fabricates or distorts medical information could directly affect patient care.

Their approach, Hallucination Detection-Guided Self-Refinement (HDSR), uses a dedicated hallucination detector to guide iterative revisions of AI-generated summaries at inference time — that is, while the model is already in use, rather than only during training. A second, more powerful variant, HDSR-PL, converts the refinement trajectories produced during this process into preference pairs that can be used to fine-tune the model itself.

Testing on real-world clinical notes from the MIMIC-IV-Note v2.2 dataset, the team found that HDSR reduced hallucinations in Llama-3.1-8B-Instruct by 24 percent, while HDSR-PL achieved a 48 percent reduction. Crucially, both methods preserved the fluency, coherence, and relevance of summaries, according to evaluations by both human experts and an LLM-based jury panel. Similar improvements were observed for Gemma-based models.

A Field in Motion

Both studies reflect a broader shift in AI research toward automated, self-correcting training pipelines that reduce reliance on costly human-labelled data. While neither paper has yet undergone peer review — both are preprints — the results contribute to a rapidly expanding body of work exploring how preference learning can be made more targeted and domain-aware.

The clinical application carries particular weight given regulatory and ethical scrutiny surrounding AI in medicine. Neither study has been evaluated in a live clinical deployment, and independent replication of results remains a standard next step in the research process.

Analysis

Why This Matters

Hallucinations are one of the primary barriers preventing widespread deployment of AI in high-stakes settings such as healthcare, law, and finance; advances in reducing them have direct real-world consequences.
Both studies suggest that automated, self-supervised training pipelines can match or exceed the performance of methods that rely on expensive human feedback, which could accelerate the pace at which safer AI systems are developed and deployed.
The clinical summarisation work in particular signals that domain-specific fine-tuning, not just general-purpose model improvement, may be necessary before AI tools can be trusted in regulated industries.

Background

Hallucination in large language models (LLMs) has been documented since at least 2021, when researchers began systematically cataloguing cases in which models produced confident but factually incorrect outputs. The problem became more publicly visible with the widespread adoption of ChatGPT from late 2022 onward, as users reported fabricated citations, invented facts, and erroneous medical or legal advice.

Direct Preference Optimization, introduced by Rafailov et al. in a 2023 paper, offered a simpler and more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) for aligning model behaviour with human preferences. DPO and its variants have since become widely used in model fine-tuning, but researchers have continued to identify limitations — particularly around vision tasks and domain-specific factual accuracy.

The medical AI space has seen particular scrutiny. High-profile studies have shown that general-purpose LLMs can perform impressively on medical licensing exams while still producing dangerous errors in clinical context. Datasets such as MIMIC-IV, which contains de-identified clinical records from major hospitals, have become standard benchmarks for evaluating AI performance in realistic healthcare settings.

Key Perspectives

AI researchers and developers: Both papers argue that self-supervised, automated approaches to preference learning can reduce hallucinations without incurring the costs of large-scale human annotation — a significant practical advantage for teams developing specialised applications.

Healthcare professionals and regulators: While a 48 percent reduction in hallucinations is a meaningful improvement, clinicians and regulators are likely to ask what the residual error rate looks like in absolute terms, and whether AI-generated summaries have been evaluated in real clinical workflows with genuine patient safety oversight.

Critics and AI safety researchers: Preprint results on benchmark datasets do not always translate to real-world performance. The use of LLM-based evaluation panels ("LLM-Jury") to assess output quality is itself contested, as such evaluators may share systematic biases with the models being assessed. Independent replication and prospective clinical trials remain the gold standard.

What to Watch

Whether either study undergoes peer review and replication by independent teams, which would significantly strengthen confidence in the reported results.
Regulatory signals from bodies such as the US FDA or the EU AI Act implementation framework regarding acceptable hallucination rates for AI tools used in clinical documentation.
Whether major electronic health record providers or hospital systems begin piloting detection-guided refinement approaches in live environments, which would mark a transition from research to deployment.

Sources

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents — cs.AI updates on arXiv.org
SagnacAssisted Enhanced OTDR for Distributed Acoustic Sensing: A Standardized Benchmark and Engineering Evaluation Framework — cs.AI updates on arXiv.org
EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language Models — cs.AI updates on arXiv.org
Explainable AI-Driven Cyber Risk Analytics and Model Reliability Assessment for Intelligent Governance of U.S. Critical Infrastructure: An XGBoost and SHAP-Based Intrusion Detection Framework — cs.AI updates on arXiv.org
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents — cs.AI updates on arXiv.org
VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents — cs.AI updates on arXiv.org
Finite Element-Based Material Learning via Automatic Differentiation: Learning constitutive neural network models from full-field deformation data — cs.AI updates on arXiv.org
P\textsuperscript{2}-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization — cs.AI updates on arXiv.org
Hallucination Detection-Guided Preference Optimization for Clinical Summarization — cs.AI updates on arXiv.org
Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation — cs.AI updates on arXiv.org