Abstract:Despite surveillance systems are becoming increasingly ubiquitous in our living environment, automated surveillance, currently based on video sensory modality and machine intelligence, lacks most of the time the robustness and reliability required in several real applications. To tackle this issue, audio sensory devices have been taken into account, both alone or in combination with video, giving birth, in the last decade, to a considerable amount of research. In this paper audio-based automated surveillance methods are organized into a comprehensive survey: a general taxonomy, inspired by the more widespread video surveillance field, is proposed in order to systematically describe the methods covering background subtraction, event classification, object tracking and situation analysis. For each of these tasks, all the significant works are reviewed, detailing their pros and cons and the context for which they have been proposed. Moreover, a specific section is devoted to audio features, discussing their expressiveness and their employment in the above described tasks. Differently, from other surveys on audio processing and analysis, the present one is specifically targeted to automated surveillance, highlighting the target applications of each described methods and providing the reader tables and schemes useful to retrieve the most suited algorithms for a specific requirement.