Publication Type : Conference Paper
Publisher : Springer Nature Singapore
Source : Lecture Notes in Networks and Systems
Url : https://doi.org/10.1007/978-981-96-2179-8_17
Campus : Coimbatore
School : School of Artificial Intelligence - Coimbatore
Year : 2025
Abstract : In this paper, we explore the Bootstrapping Language Image Pretraining (BLIP) model’s performance in Visual Question Answering (VQA) tasks, particularly in the medical domain. We aim to propose an effective approach for medical image analysis by fine-tuning and optimizing the BLIP model using the Low-Rank Adaptation (LoRA) adaptation. The optimized model is tested on a combination of benchmark datasets such as MED-2019, VQA-RAD, and SLAKE-English for variety and randomness. We obtained an overall test accuracy of 75.67% for VQA-RAD and 80.9% for MED-2019 and SLAKE-ENGLISH, which highlights the potential of the LoRA-enhanced BLIP model in promoting healthcare solutions.
Cite this Research Publication : M. R. Dinesh Kumar, Pillalamarri Akshaya, R. Saivarsha, N. T. Shrish Surya, B. Premjith, V. Sowmya, G. Jyothish Lal, Enhanced Vision Language Model for Visual Question Answering in Medical Images, Lecture Notes in Networks and Systems, Springer Nature Singapore, 2025, https://doi.org/10.1007/978-981-96-2179-8_17