Back close

Audio-Based Video Segmentation for Long Duration Videos Using Triplet-Loss Based Sentence Transformers and Acoustic Characteristics

Publication Type : Conference Proceedings

Publisher : Springer Nature Switzerland

Source : Communications in Computer and Information Science

Url : https://doi.org/10.1007/978-3-031-79041-6_22

Campus : Bengaluru

School : School of Engineering

Department : Electronics and Communication

Year : 2025

Abstract : Online lecture videos have become very popular after pandemic. But sometimes students and research communities instead of going through the whole video completely, focus only on the few important parts of content presented in the videos, which leads to a new research area of video segmentation. This paper focused on a novel way to find and move through specific important parts of the lecture videos. There are different methods to do lecture video segmentation. The proposed model employs the Sentence transformers with the triplet loss function for representation of video lecture‘s audio transcripts and acoustic characteristics. To find the best video partitioning, a genetic algorithm is used. The dataset used in this work consists of 334 videos, amongst which 25 videos selected are of long duration that adds to the increased complexity for video segmentation. Duration of each video is approximate 52 min. The proposed work attained a precision of 0.51, recall of 0.52 and the F1-score of 0.498 which is fairly acceptable as per the different state-of-art techniques proposed in literature.

Cite this Research Publication : M. Vasuki, Rimjhim Padam Singh, Susmitha Vekkot, Vivek Venugopal, Audio-Based Video Segmentation for Long Duration Videos Using Triplet-Loss Based Sentence Transformers and Acoustic Characteristics, Communications in Computer and Information Science, Springer Nature Switzerland, 2025, https://doi.org/10.1007/978-3-031-79041-6_22

Admissions Apply Now