Publication Type : Conference Paper
Publisher : Springer Nature Singapore
Source : Lecture Notes in Networks and Systems
Url : https://doi.org/10.1007/978-981-16-6723-7_32
Campus : Coimbatore
School : School of Computing
Department : Computer Science and Engineering
Year : 2022
Abstract :
The objective of video description or dense video captioning task is to generate a description of the video content. The task consists of identifying and describing distinct temporal segments called events. Existing methods utilize relative context to obtain better sentences. In this paper, we propose a hierarchical captioning model which follows encoder-decoder scheme and consists of two LSTMs for sentence generation. The visual and language information are encoded as context using bi-directional alteration of single-stream temporal action proposal network and is utilized in the next stage to produce coherent and contextually aware sentences. The proposed system is tested on ActivityNet captioning dataset and performed relatively better when compared with other existing approaches.
Cite this Research Publication : Jaivik Dave, S. Padmavathi, Hierarchical Language Modeling for Dense Video Captioning, Lecture Notes in Networks and Systems, Springer Nature Singapore, 2022, https://doi.org/10.1007/978-981-16-6723-7_32