Abstract:Automatic markerless estimation of infant posture and motion from ordinary videos carries great potential for movement studies "in the wild", facilitating understanding of motor development and massively increasing the chances of early diagnosis of disorders. There is rapid development of human pose estimation methods in computer vision thanks to advances in deep learning and machine learning. However, these methods are trained on datasets featuring adults in different contexts. This work tests and compares seven popular methods (AlphaPose, DeepLabCut/DeeperCut, Detectron2, HRNet, MediaPipe/BlazePose, OpenPose, and ViTPose) on videos of infants in supine position. Surprisingly, all methods except DeepLabCut and MediaPipe have competitive performance without additional finetuning, with ViTPose performing best. Next to standard performance metrics (object keypoint similarity, average precision and recall), we introduce errors expressed in the neck-mid-hip ratio and additionally study missed and redundant detections and the reliability of the internal confidence ratings of the different methods, which are relevant for downstream tasks. Among the networks with competitive performance, only AlphaPose could run close to real time (27 fps) on our machine. We provide documented Docker containers or instructions for all the methods we used, our analysis scripts, and processed data at https://hub.docker.com/u/humanoidsctu and https://osf.io/x465b/.
Abstract:The mechanisms of infant development are far from understood. Learning about one's own body is likely a foundation for subsequent development. Here we look specifically at the problem of how spontaneous touches to the body in early infancy may give rise to first body models and bootstrap further development such as reaching competence. Unlike visually elicited reaching, reaching to own body requires connections of the tactile and motor space only, bypassing vision. Still, the problems of high dimensionality and redundancy of the motor system persist. In this work, we present an embodied computational model on a simulated humanoid robot with artificial sensitive skin on large areas of its body. The robot should autonomously develop the capacity to reach for every tactile sensor on its body. To do this efficiently, we employ the computational framework of intrinsic motivations and variants of goal babbling, as opposed to motor babbling, that prove to make the exploration process faster and alleviate the ill-posedness of learning inverse kinematics. Based on our results, we discuss the next steps in relation to infant studies: what information will be necessary to further ground this computational model in behavioral data.
Abstract:In primate brains, tactile and proprioceptive inputs are relayed to the somatosensory cortex which is known for somatotopic representations, or, "homunculi". Our research centers on understanding the mechanisms of the formation of these and more higher-level body representations (body schema) by using humanoid robots and neural networks to construct models. We specifically focus on how spatial representation of the body may be learned from somatosensory information in self-touch configurations. In this work, we target the representation of proprioceptive inputs, which we take to be joint angles in the robot. The inputs collected in different body postures serve as inputs to a Self-Organizing Map (SOM) with a 2D lattice on the output. With unrestricted, all-to-all connections, the map is not capable of representing the input space while preserving the topological relationships, because the intrinsic dimensionality of the body posture space is too large. Hence, we use a method we developed previously for tactile inputs (Hoffmann, Straka et al. 2018) called MRF-SOM, where the Maximum Receptive Field of output neurons is restricted so they only learn to represent specific parts of the input space. This is in line with the receptive fields of neurons in somatosensory areas representing proprioception that often respond to combination of few joints (e.g. wrist and elbow).