Publication Type : Journal Article
Publisher : MDPI AG
Source : Technologies
Url : https://doi.org/10.3390/technologies14010003
Campus : Coimbatore
School : School of Engineering
Department : Electronics and Communication
Year : 2025
Abstract : Identifying instruments in polyphonic audio is challenging due to overlapping spectra and variations in timbre and playing styles. This task is central to music information retrieval, with applications in transcription, recommendation, and indexing. We propose a dual-branch Convolutional Neural Network (CNN) that processes Mel-spectrograms and binary pitch masks, fused through a cross-attention mechanism to emphasize pitch-salient regions. On the IRMAS dataset, the model achieves competitive performance with state-of-the-art methods, reaching a micro F1 of 0.64 and a macro F1 of 0.57 with only 0.878M parameters. Ablation studies and t-SNE analyses further highlight the benefits of cross-modal attention for robust predominant instrument recognition.
Cite this Research Publication : Lekshmi Chandrika Reghunath, Rajeev Rajan, Christian Napoli, Cristian Randieri, Cross-Attentive CNNs for Joint Specral and Pitch Feature Learning in Predominant Instrument Recognition from Polyphonic Music, Technologies, MDPI AG, 2025, https://doi.org/10.3390/technologies14010003