Abstract:Motor control is a set of time-varying muscle excitations which generate desired motions for a biomechanical system. Muscle excitations cannot be directly measured from live subjects. An alternative approach is to estimate muscle activations using inverse motion-driven simulation. In this article, we propose a deep reinforcement learning method to estimate the muscle excitations in simulated biomechanical systems. Here, we introduce a custom-made reward function which incentivizes faster point-to-point tracking of target motion. Moreover, we deploy two new techniques, namely, episode-based hard update and dual buffer experience replay, to avoid feedback training loops. The proposed method is tested in four simulated 2D and 3D environments with 6 to 24 axial muscles. The results show that the models were able to learn muscle excitations for given motions after nearly 100,000 simulated steps. Moreover, the root mean square error in point-to-point reaching of the target across experiments was less than 1% of the length of the domain of motion. Our reinforcement learning method is far from the conventional dynamic approaches as the muscle control is derived functionally by a set of distributed neurons. This can open paths for neural activity interpretation of this phenomenon.
Abstract:Vocal tract configurations play a vital role in generating distinguishable speech sounds, by modulating the airflow and creating different resonant cavities in speech production. They contain abundant information that can be utilized to better understand the underlying speech production mechanism. As a step towards automatic mapping of vocal tract shape geometry to acoustics, this paper employs effective video action recognition techniques, like Long-term Recurrent Convolutional Networks (LRCN) models, to identify different vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. Such a model typically combines a CNN based deep hierarchical visual feature extractor with Recurrent Networks, that ideally makes the network spatio-temporally deep enough to learn the sequential dynamics of a short video clip for video classification tasks. We use a database consisting of 2D real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The comparative performances of this class of algorithms under various parameter settings and for various classification tasks are discussed. Interestingly, the results show a marked difference in the model performance in the context of speech classification with respect to generic sequence or video classification tasks.