Abstract:The issue of domain shift remains a problematic phenomenon in most real-world datasets and clinical audio is no exception. In this work, we study the nature of domain shift in a clinical database of infant cry sounds acquired across different geographies. We find that though the pitches of infant cries are similarly distributed regardless of the place of birth, other characteristics introduce peculiar biases into the data. We explore methodologies for mitigating the impact of domain shift in a model for identifying neurological injury from cry sounds. We adapt unsupervised domain adaptation methods from computer vision which learn an audio representation that is domain-invariant to hospitals and is task discriminative. We also propose a new approach, target noise injection (TNI), for unsupervised domain adaptation which requires neither labels nor training data from the target domain. Our best-performing model significantly improves target accuracy by 7.2%, without negatively affecting the source domain.
Abstract:Since the 1960s, neonatal clinicians have known that newborns suffering from certain neurological conditions exhibit altered crying patterns such as the high-pitched cry in birth asphyxia. Despite an annual burden of over 1.5 million infant deaths and disabilities, early detection of neonatal brain injuries due to asphyxia remains a challenge, particularly in developing countries where the majority of births are not attended by a trained physician. Here we report on the first inter-continental clinical study to demonstrate that neonatal brain injury can be reliably determined from recorded infant cries using an AI algorithm we call Roseline. Previous and recent work has been limited by the lack of a large, high-quality clinical database of cry recordings, constraining the application of state-of-the-art machine learning. We develop a new training methodology for audio-based pathology detection models and evaluate this system on a large database of newborn cry sounds acquired from geographically diverse settings -- 5 hospitals across 3 continents. Our system extracts interpretable acoustic biomarkers that support clinical decisions and is able to accurately detect neurological injury from newborns' cries with an AUC of 92.5% (88.7% sensitivity at 80% specificity). Cry-based neurological monitoring opens the door for low-cost, easy-to-use, non-invasive and contact-free screening of at-risk babies, especially when integrated into simple devices like smartphones or neonatal ICU monitors. This would provide a reliable tool where there are no alternatives, but also curtail the need to regularly exert newborns to physically-exhausting or radiation-exposing assessments such as brain CT scans. This work sets the stage for embracing the infant cry as a vital sign and indicates the potential of AI-driven sound monitoring for the future of affordable healthcare.
Abstract:This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries, and the accompanying CryCeleb 2023 task - a public speaker verification challenge based on infant cry sounds. We release for academic usage more than 6 hours of manually segmented cry sounds from 786 newborns to encourage research in infant cry analysis.
Abstract:Recurrent neural networks (RNNs) are powerful tools for sequential modeling, but typically require significant overparameterization and regularization to achieve optimal performance. This leads to difficulties in the deployment of large RNNs in resource-limited settings, while also introducing complications in hyperparameter selection and training. To address these issues, we introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell using a lightweight tensor-train (TT) factorization. This approach represents a novel form of weight sharing which reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs. Experiments on image classification and speaker verification tasks demonstrate further benefits for reducing inference times and stabilizing model training and hyperparameter selection.
Abstract:Despite continuing medical advances, the rate of newborn morbidity and mortality globally remains high, with over 6 million casualties every year. The prediction of pathologies affecting newborns based on their cry is thus of significant clinical interest, as it would facilitate the development of accessible, low-cost diagnostic tools\cut{ based on wearables and smartphones}. However, the inadequacy of clinically annotated datasets of infant cries limits progress on this task. This study explores a neural transfer learning approach to developing accurate and robust models for identifying infants that have suffered from perinatal asphyxia. In particular, we explore the hypothesis that representations learned from adult speech could inform and improve performance of models developed on infant speech. Our experiments show that models based on such representation transfer are resilient to different types and degrees of noise, as well as to signal loss in time and frequency domains.
Abstract:Extremely preterm infants often require endotracheal intubation and mechanical ventilation during the first days of life. Due to the detrimental effects of prolonged invasive mechanical ventilation (IMV), clinicians aim to extubate infants as soon as they deem them ready. Unfortunately, existing strategies for prediction of extubation readiness vary across clinicians and institutions, and lead to high reintubation rates. We present an approach using Random Forest classifiers for the analysis of cardiorespiratory variability to predict extubation readiness. We address the issue of data imbalance by employing random undersampling of examples from the majority class before training each Decision Tree in a bag. By incorporating clinical domain knowledge, we further demonstrate that our classifier could have identified 71% of infants who failed extubation, while maintaining a success detection rate of 78%.
Abstract:Extremely preterm infants commonly require intubation and invasive mechanical ventilation after birth. While the duration of mechanical ventilation should be minimized in order to avoid complications, extubation failure is associated with increases in morbidities and mortality. As part of a prospective observational study aimed at developing an accurate predictor of extubation readiness, Markov and semi-Markov chain models were applied to gain insight into the respiratory patterns of these infants, with more robust time-series modeling using semi-Markov models. This model revealed interesting similarities and differences between newborns who succeeded extubation and those who failed. The parameters of the model were further applied to predict extubation readiness via generative (joint likelihood) and discriminative (support vector machine) approaches. Results showed that up to 84\% of infants who failed extubation could have been accurately identified prior to extubation.
Abstract:After birth, extremely preterm infants often require specialized respiratory management in the form of invasive mechanical ventilation (IMV). Protracted IMV is associated with detrimental outcomes and morbidities. Premature extubation, on the other hand, would necessitate reintubation which is risky, technically challenging and could further lead to lung injury or disease. We present an approach to modeling respiratory patterns of infants who succeeded extubation and those who required reintubation which relies on Markov models. We compare the use of traditional Markov chains to semi-Markov models which emphasize cross-pattern transitions and timing information, and to multi-chain Markov models which can concisely represent non-stationarity in respiratory behavior over time. The models we developed expose specific, unique similarities as well as vital differences between the two populations.