Publication Type : Journal Article
Publisher : Springer Science and Business Media LLC
Source : The Journal of Supercomputing
Url : https://doi.org/10.1007/s11227-025-07279-7
Campus : Coimbatore
School : School of Artificial Intelligence - Coimbatore
Department : Center for Computational Engineering and Networking (CEN)
Year : 2025
Abstract : This study presents a novel framework for predominant instrument recognition in polyphonic music using Focal Modulation Networks (FMN). Trained on single-instrument data, FMN accurately identifies multiple instruments in unseen test data without sliding window analysis. It replaces self-attention with a focal modulation module, where focal context aggregation prioritizes key spectral regions, and adaptive query modulation refines feature representations. Extensive experiments demonstrate that FMN achieves a macro F1-score of 0.58 and a micro F1-score of 0.66, representing a 10 and 16% improvement over the state-of-the-art Han model, respectively. FMN consistently outperforms various transformer architectures, including Vision Transformers (ViT), Swin Transformers (SwinT), and Compact Convolutional Transformers (CCT), with enhanced accuracy and interpretability. Our findings highlight FMN’s potential in capturing complex musical patterns and relationships, establishing a new benchmark for predominant instrument recognition in polyphonic music, without the need for aggregation strategies or sliding window approaches
Cite this Research Publication : C. R. Lekshmi, Rajeev Rajan, Beyond transformers: hierarchical contextualization and gated aggregation for multiple predominant instrument recognition in polyphonic music, The Journal of Supercomputing, Springer Science and Business Media LLC, 2025, https://doi.org/10.1007/s11227-025-07279-7