Back close

Cross-Attentive CNNs for Joint Specral and Pitch Feature Learning in Predominant Instrument Recognition from Polyphonic Music

Publication Type : Journal Article

Publisher : MDPI AG

Source : Technologies

Url : https://doi.org/10.3390/technologies14010003

Campus : Coimbatore

School : School of Engineering

Department : Electronics and Communication

Year : 2025

Abstract : Identifying instruments in polyphonic audio is challenging due to overlapping spectra and variations in timbre and playing styles. This task is central to music information retrieval, with applications in transcription, recommendation, and indexing. We propose a dual-branch Convolutional Neural Network (CNN) that processes Mel-spectrograms and binary pitch masks, fused through a cross-attention mechanism to emphasize pitch-salient regions. On the IRMAS dataset, the model achieves competitive performance with state-of-the-art methods, reaching a micro F1 of 0.64 and a macro F1 of 0.57 with only 0.878M parameters. Ablation studies and t-SNE analyses further highlight the benefits of cross-modal attention for robust predominant instrument recognition.

Cite this Research Publication : Lekshmi Chandrika Reghunath, Rajeev Rajan, Christian Napoli, Cristian Randieri, Cross-Attentive CNNs for Joint Specral and Pitch Feature Learning in Predominant Instrument Recognition from Polyphonic Music, Technologies, MDPI AG, 2025, https://doi.org/10.3390/technologies14010003

Admissions Apply Now