Back close

Visual Questions Answering Developments, Applications, Datasets and Opportunities: A State-of-the-Art Survey

Publication Type : Conference Paper

Publisher : IEEE

Source : 2023 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS)

Url : https://doi.org/10.1109/icscds56580.2023.10104870

Campus : Haridwar

School : School of Computing

Year : 2023

Abstract : Visual Question Answering (VQA) is an emerging field in Artificial Intelligence (AI) that aims to enable machines to understand and answer questions about visual content. In this survey paper, current state-of-the art research is extensively surveyed to highlight limitations and futuristic opportunities. Visual QA systems use Natural Language Processing and machine learning techniques to understand and respond to questions posed by users. The paper then reviews the recent advances in neural network-based models and pre-trained language models. The paper also discusses the challenges facing visual QA systems, including the need for large-scale training data, the ability to handle complex and open-ended questions, and the need for robust evaluation metrics. Further, different types of datasets and evaluation metrics used in the literature are summarized, as well as the challenges and open research problems that remain to be addressed. Overall, it is concluded that VQA is a challenging task that requires a combination of visual understanding and natural language processing skills, and that there is still much scope for improvement in terms of accuracy and generalization.

Cite this Research Publication : Harsimran Jit Singh, Gourav Bathla, Munish Mehta, Gunjan Chhabra, Pardeep Singh, Visual Questions Answering Developments, Applications, Datasets and Opportunities: A State-of-the-Art Survey, 2023 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), IEEE, 2023, https://doi.org/10.1109/icscds56580.2023.10104870

Admissions Apply Now