Back close

Enhanced Vision Language Model for Visual Question Answering in Medical Images

Publication Type : Conference Paper

Publisher : Springer Nature Singapore

Source : Lecture Notes in Networks and Systems

Url : https://doi.org/10.1007/978-981-96-2179-8_17

Campus : Coimbatore

School : School of Artificial Intelligence - Coimbatore

Year : 2025

Abstract : In this paper, we explore the Bootstrapping Language Image Pretraining (BLIP) model’s performance in Visual Question Answering (VQA) tasks, particularly in the medical domain. We aim to propose an effective approach for medical image analysis by fine-tuning and optimizing the BLIP model using the Low-Rank Adaptation (LoRA) adaptation. The optimized model is tested on a combination of benchmark datasets such as MED-2019, VQA-RAD, and SLAKE-English for variety and randomness. We obtained an overall test accuracy of 75.67% for VQA-RAD and 80.9% for MED-2019 and SLAKE-ENGLISH, which highlights the potential of the LoRA-enhanced BLIP model in promoting healthcare solutions.

Cite this Research Publication : M. R. Dinesh Kumar, Pillalamarri Akshaya, R. Saivarsha, N. T. Shrish Surya, B. Premjith, V. Sowmya, G. Jyothish Lal, Enhanced Vision Language Model for Visual Question Answering in Medical Images, Lecture Notes in Networks and Systems, Springer Nature Singapore, 2025, https://doi.org/10.1007/978-981-96-2179-8_17

Admissions Apply Now