Back close

Beyond transformers: hierarchical contextualization and gated aggregation for multiple predominant instrument recognition in polyphonic music

Publication Type : Journal Article

Publisher : Springer Science and Business Media LLC

Source : The Journal of Supercomputing

Url : https://doi.org/10.1007/s11227-025-07279-7

Campus : Coimbatore

School : School of Artificial Intelligence - Coimbatore

Department : Center for Computational Engineering and Networking (CEN)

Year : 2025

Abstract : This study presents a novel framework for predominant instrument recognition in polyphonic music using Focal Modulation Networks (FMN). Trained on single-instrument data, FMN accurately identifies multiple instruments in unseen test data without sliding window analysis. It replaces self-attention with a focal modulation module, where focal context aggregation prioritizes key spectral regions, and adaptive query modulation refines feature representations. Extensive experiments demonstrate that FMN achieves a macro F1-score of 0.58 and a micro F1-score of 0.66, representing a 10 and 16% improvement over the state-of-the-art Han model, respectively. FMN consistently outperforms various transformer architectures, including Vision Transformers (ViT), Swin Transformers (SwinT), and Compact Convolutional Transformers (CCT), with enhanced accuracy and interpretability. Our findings highlight FMN’s potential in capturing complex musical patterns and relationships, establishing a new benchmark for predominant instrument recognition in polyphonic music, without the need for aggregation strategies or sliding window approaches

Cite this Research Publication : C. R. Lekshmi, Rajeev Rajan, Beyond transformers: hierarchical contextualization and gated aggregation for multiple predominant instrument recognition in polyphonic music, The Journal of Supercomputing, Springer Science and Business Media LLC, 2025, https://doi.org/10.1007/s11227-025-07279-7

Admissions Apply Now