Improving acoustic monitoring of biodiversity using deep learning-based source separation algorithms
Passive acoustic monitoring of the environment has been suggested as an effective tool for investigating the dynamics of biodiversity across spatial and temporal scales. Recent development in automatic recorders has allowed environmental acoustic data to be collected in an unattended way for a long duration. However, one of the major challenges for acoustic monitoring is to identify sounds of target taxa in recordings which usually contain undesired signals from non-target sources. In addition, high variation in the characteristics of target sounds, co-occurrence of sounds from multiple target taxa, and a lack of reference data make it even more difficult to separate acoustic signals from different sources. To overcome this issue, we developed an unsupervised source separation algorithm based on a multi-layer (deep) non-negative matrix factorization (NMF). Using reference echolocation calls of 13 bat species, we evaluated the performance of the multi-layer NMF in separating species-specific calls. Results showed that the multi-layer NMF, especially when being pre-trained with reference calls, outperformed the conventional supervised single-layer NMF. We also evaluated the performance of the multi-layer NMF in identifying different types of bat calls in recordings collected in the field. We found comparable performance in call types identification between the multi-layer NMF and human observers. These results suggest that the proposed multi-layer NMF approach can be used to effectively separate acoustic signals of different taxa from long-duration field recordings in an unsupervised manner. The approach can thus improve the applicability of passive acoustic monitoring as a tool to investigate the responses of biodiversity to the changing environment.