Abstract:Intelligible speech is produced by creating varying internal local muscle groupings---i.e., functional units---that are generated in a systematic and coordinated manner. There are two major challenges in characterizing and analyzing functional units. First, due to the complex and convoluted nature of tongue structure and function, it is of great importance to develop a method that can accurately decode complex muscle coordination patterns during speech. Second, it is challenging to keep identified functional units across subjects comparable due to their substantial variability. In this work, to address these challenges, we develop a new deep learning framework to identify common and subject-specific functional units of tongue motion during speech. Our framework hinges on joint deep graph-regularized sparse non-negative matrix factorization (NMF) using motion quantities derived from displacements by tagged Magnetic Resonance Imaging. More specifically, we transform NMF with sparse and manifold regularizations into modular architectures akin to deep neural networks by means of unfolding the Iterative Shrinkage-Thresholding Algorithm to learn interpretable building blocks and associated weighting map. We then apply spectral clustering to common and subject-specific functional units. Experiments carried out with simulated datasets show that the proposed method surpasses the comparison methods. Experiments carried out with in vivo tongue motion datasets show that the proposed method can determine the common and subject-specific functional units with increased interpretability and decreased size variability.
Abstract:Muscle coordination patterns of lingual behaviors are synergies generated by deforming local muscle groups in a variety of ways. Functional units are functional muscle groups of local structural elements within the tongue that compress, expand, and move in a cohesive and consistent manner. Identifying the functional units using tagged-Magnetic Resonance Imaging (MRI) sheds light on the mechanisms of normal and pathological muscle coordination patterns, yielding improvement in surgical planning, treatment, or rehabilitation procedures. Here, to mine this information, we propose a matrix factorization and probabilistic graphical model framework to produce building blocks and their associated weighting map using motion quantities extracted from tagged-MRI. Our tagged-MRI imaging and accurate voxel-level tracking provide previously unavailable internal tongue motion patterns, thus revealing the inner workings of the tongue during speech or other lingual behaviors. We then employ spectral clustering on the weighting map to identify the cohesive regions defined by the tongue motion that may involve multiple or undocumented regions. To evaluate our method, we perform a series of experiments. We first use two-dimensional images and synthetic data to demonstrate the accuracy of our method. We then use three-dimensional synthetic and \textit{in vivo} tongue motion data using protrusion and simple speech tasks to identify subject-specific and data-driven functional units of the tongue in localized regions.