Back close

Enhanced Image Captioning Using CNN and BLIP Models

Publication Type : Conference Paper

Publisher : IEEE

Source : 2025 International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI)

Url : https://doi.org/10.1109/icmsci62561.2025.10894526

Campus : Bengaluru

School : School of Computing

Year : 2025

Abstract : This research focuses on generating image captions using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) models. As deep learning advances, the availability of large datasets and increased computing power make it more feasible to build models capable of creating captions for images. This paper uses CNN and RNN models in Python to achieve this. Image captioning combines image recognition and Natural Language Processing (NLP) to interpret image context and express it in English, drawing on core computer vision principles. This study reviews important concepts in image captioning, including the applications of Keras, NumPy, and Jupyter notebook for development. This paper also explores the use of the Flickr dataset and CNN for image classification. The BLEU score of the proposed models is found to be close to 60%. On further enhancement to the model, it could be achieved to get a better score.

Cite this Research Publication : Vambara Tejesh, Supriya M., Enhanced Image Captioning Using CNN and BLIP Models, 2025 International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI), IEEE, 2025, https://doi.org/10.1109/icmsci62561.2025.10894526

Admissions Apply Now