Abstract:This paper studies changes in articulatory configurations across genders and periods using an inversion from acoustic to articulatory parameters. From a diachronic corpus based on French media archives spanning 60 years from 1955 to 2015, automatic transcription and forced alignment allowed extracting the central frame of each vowel. More than one million frames were obtained from over a thousand speakers across gender and age categories. Their formants were used from these vocalic frames to fit the parameters of Maeda's articulatory model. Evaluations of the quality of these processes are provided. We focus here on two parameters of Maeda's model linked to total vocal tract length: the relative position of the larynx (higher for females) and the lips protrusion (more protruded for males). Implications for voice quality across genders are discussed. The effect across periods seems gender independent; thus, the assertion that females lowered their pitch with time is not supported.
Abstract:We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives. The speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders. The fundamental frequency ($F_0$) and the first four formants (F1-4) were estimated. Procedures used to ensure the quality of these estimations on heterogeneous data are described. From each speaker's $F_0$ distribution, the base-$F_0$ value was calculated to estimate the register. Average vocal tract length was estimated from formant frequencies. Base-$F_0$ and vocal tract length were fit by linear mixed models to evaluate how they may have changed across time periods and genders, corrected for age effects. Results show an effect of the period with a tendency to lower voices, independently of gender. A lowering of pitch is observed with age for female but not male speakers.
Abstract:This paper presents a software allowing to describe voices using a continuous Voice Femininity Percentage (VFP). This system is intended for transgender speakers during their voice transition and for voice therapists supporting them in this process. A corpus of 41 French cis- and transgender speakers was recorded. A perceptual evaluation allowed 57 participants to estimate the VFP for each voice. Binary gender classification models were trained on external gender-balanced data and used on overlapping windows to obtain average gender prediction estimates, which were calibrated to predict VFP and obtained higher accuracy than $F_0$ or vocal track length-based models. Training data speaking style and DNN architecture were shown to impact VFP estimation. Accuracy of the models was affected by speakers' age. This highlights the importance of style, age, and the conception of gender as binary or not, to build adequate statistical representations of cultural concepts.