Abstract:Research on sound event detection (SED) in environmental settings has seen increased attention in recent years. The large amounts of (private) domestic or urban audio data needed raise significant logistical and privacy concerns. The inherently distributed nature of these tasks, make federated learning (FL) a promising approach to take advantage of largescale data while mitigating privacy issues. While FL has also seen increased attention recently, to the best of our knowledge there is no research towards FL for SED. To address this gap and foster further research in this field, we create and publish novel FL datasets for SED in domestic and urban environments. Furthermore, we provide baseline results on the datasets in a FL context for three deep neural network architectures. The results indicate that FL is a promising approach for SED, but faces challenges with divergent data distributions inherent to distributed client edge devices.