Publication Type : Conference Paper
Publisher : IEEE
Source : 2024 IEEE 9th International Conference for Convergence in Technology (I2CT)
Url : https://doi.org/10.1109/i2ct61223.2024.10544337
Campus : Bengaluru
School : School of Computing
Year : 2024
Abstract : In the rapidly evolving landscape of computer vision, image captioning has emerged as a challenging task. This report explores the techniques involved in image captioning using deep learning techniques, especially the encoder-decoder framework. Models such as CNN-LSTM, CNN-GRU,Xception – YOLO v4, GIT Based Model are used. Recognition is given to the significance of ample and well-annotated datasets in teaching the algorithms to understand the complex relationships between visual elements and textual descriptions. Along with traditional evaluation metrics like BLEU score, this study also employs metrics such as METEOR, ROUGE-L, and SPICE to compare performance between models. The findings highlight on the impact of deep learning in enabling computers to generate captions for diverse visual content.
Cite this Research Publication : K Kushal, Madhav Manoj, Kevansh Reddy, Priyanka C Nair, An In-Depth Exploration of Image Captioning Training Approaches and Performance Analysis, 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), IEEE, 2024, https://doi.org/10.1109/i2ct61223.2024.10544337