Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Brad Story

Towards detecting the pathological subharmonic voicing with fully convolutional neural networks

Jan 15, 2025

Takeshi Ikuma, Melda Kunduk, Brad Story, Andrew J. McWhorter

Abstract:Many voice disorders induce subharmonic phonation, but voice signal analysis is currently lacking a technique to detect the presence of subharmonics reliably. Distinguishing subharmonic phonation from normal phonation is a challenging task as both are nearly periodic phenomena. Subharmonic phonation adds cyclical variations to the normal glottal cycles. Hence, the estimation of subharmonic period requires a wholistic analysis of the signals. Deep learning is an effective solution to this type of complex problem. This paper describes fully convolutional neural networks which are trained with synthesized subharmonic voice signals to classify the subharmonic periods. Synthetic evaluation shows over 98% classification accuracy, and assessment of sustained vowel recordings demonstrates encouraging outcomes as well as the areas for future improvements.

* 9 pages, 8 figures, submitted to IEEE Trans. Audio Speech Lang. Process

Via

Access Paper or Ask Questions

Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

Aug 31, 2023

Dhananjaya Gowda, Sudarsana Reddy Kadiri, Brad Story, Paavo Alku

Figure 1 for Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

Figure 2 for Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

Figure 3 for Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

Figure 4 for Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

Abstract:In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimate-and-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10--50 ms), followed by a tracking stage based on dynamic programming or a linear state-space model. One of the main disadvantages of these approaches is that the tracking stage, however good it may be, cannot improve upon the formant estimation accuracy of the first stage. The proposed TVQCP method provides a single-stage formant tracking that combines the estimation and tracking stages into one. TVQCP analysis combines three approaches to improve formant estimation and tracking: (1) it uses temporally weighted quasi-closed-phase analysis to derive closed-phase estimates of the vocal tract with reduced interference from the excitation source, (2) it increases the residual sparsity by using the $L_1$ optimization and (3) it uses time-varying linear prediction analysis over long time windows (e.g., 100--200 ms) to impose a continuity constraint on the vocal tract model and hence on the formant trajectories. Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner). Matlab scripts for the proposed method can be found at: https://github.com/njaygowda/ftrack

* IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28, pp. 1901-1914, 2020

Via

Access Paper or Ask Questions