Research Info

Home \Enhancing chest X-ray report ...
Title
Enhancing chest X-ray report generation with pathology-guided prompts and vision-language shortcut bias mitigation
Type Article
Keywords
Vision-language models, Chest X-ray, Medical image analysis, Trustworthy AI
Abstract
remains a complex and time-consuming task. This work addresses the challenge of automated chest X-ray report generation by leveraging recent advances in vision-language models (VLMs) and proposing a novel framework that integrates prompt-guided supervision and shortcut bias mitigation techniques into a finetuned VLM (BLIP) to enhance both the accuracy and trustworthiness of generated medical reports. Methods for enhancing text coherence and mitigating shortcut bias in VLMs are investigated in this work using pathological prompts and generative approaches, without modifying the model architecture or requiring multi-objective training, approaches that have been largely overlooked in the existing literature. Pathological labels are used as natural language prompts to guide language modeling and improve the model’s robustness through curriculum learning by introducing controlled label noise during training. To mitigate shortcut bias, where spurious visual–textual correlations (e.g., support devices) may mislead the model, a multi-modal bias mitigation strategy is proposed in which image artifacts are removed using a generative diffusion model and the corresponding texts are refined using a large language model, thereby achieving more causally grounded representations. Experiments are conducted on the newly released CheXpert Plus dataset, demonstrating improvements in report quality and robustness. Experiments conducted on the test set demonstrated improvements over the baseline by 15.5% in ROUGE-L, 20% in METEOR, 29% in RadGraph, 44% in 1/RadCliQ, and 16.8%–34.5% in BLEU metrics. Furthermore, our multi-modal shortcut bias mitigation method improves clinical coherency of the generated reports, while shifting the focus of the model toward more relevant regions in the image. The findings contribute to the development of safer and more trustworthy AI systems in radiology, offering a scalable strategy for enhancing vision-language models. The source code is publicly availabl
Researchers mohammad barzegar (First researcher) , Habib Rostami (Second researcher) , Amir Sanati (Third researcher) , rezvan afshoon (Fourth researcher) , abolghasem kosari (Fifth researcher) , Ahmad Keshavarz (Not in first six researchers)