Abstract:My research objective is to explicitly bridge the gap between high computational performance and low power dissipation of robot on-board hardware by designing a bio-inspired tapered whisker neuromorphic computing (also called reservoir computing) system for offroad robot environment perception and navigation, that centres the interaction between a robot's body and its environment. Mobile robots performing tasks in unknown environments need to traverse a variety of complex terrains, and they must be able to reliably and quickly identify and characterize these terrains to avoid getting into potentially challenging or catastrophic circumstances. To solve this problem, I drew inspiration from animals like rats and seals, just relying on whiskers to perceive surroundings information and survive in dark and narrow environments. Additionally, I looked to the human cochlear which can separate different frequencies of sound. Based on these insights, my work addresses this need by exploring the physical whisker-based reservoir computing for quick and cost-efficient mobile robots environment perception and navigation step by step. This research could help us understand how the compliance of the biological counterparts helps robots to dynamically interact with the environment and provides a new solution compared with current methods for robot environment perception and navigation with limited computational resources, such as Mars.
Abstract:Existing differentiable channel pruning methods often attach scaling factors or masks behind channels to prune filters with less importance, and assume uniform contribution of input samples to filter importance. Specifically, the effects of instance complexity on pruning performance are not yet fully investigated. In this paper, we propose a simple yet effective differentiable network pruning method CAP based on instance complexity-aware filter importance scores. We define instance complexity related weight for each sample by giving higher weights to hard samples, and measure the weighted sum of sample-specific soft masks to model non-uniform contribution of different inputs, which encourages hard samples to dominate the pruning process and the model performance to be well preserved. In addition, we introduce a new regularizer to encourage polarization of the masks, such that a sweet spot can be easily found to identify the filters to be pruned. Performance evaluations on various network architectures and datasets demonstrate CAP has advantages over the state-of-the-arts in pruning large networks. For instance, CAP improves the accuracy of ResNet56 on CIFAR-10 dataset by 0.33% aftering removing 65.64% FLOPs, and prunes 87.75% FLOPs of ResNet50 on ImageNet dataset with only 0.89% Top-1 accuracy loss.
Abstract:This paper shows analytical and experimental evidence of using the vibration dynamics of a compliant whisker for accurate terrain classification during steady state motion of a mobile robot. A Hall effect sensor was used to measure whisker vibrations due to perturbations from the ground. Analytical results predict that the whisker vibrations will have a dominant frequency at the vertical perturbation frequency of the mobile robot sandwiched by two other less dominant but distinct frequency components. These frequency components may come from bifurcation of vibration frequency due to nonlinear interaction dynamics at steady state. Experimental results also exhibit distinct dominant frequency components unique to the speed of the robot and the terrain roughness. This nonlinear dynamic feature is used in a deep multi-layer perceptron neural network to classify terrains. We achieved 85.6\% prediction success rate for seven flat terrain surfaces with different textures.
Abstract:In this paper, we propose a spatial-temporal relational reasoning networks (STRRN) approach to investigate the problem of omni-supervised face alignment in videos. Unlike existing fully supervised methods which rely on numerous annotations by hand, our learner exploits large scale unlabeled videos plus available labeled data to generate auxiliary plausible training annotations. Motivated by the fact that neighbouring facial landmarks are usually correlated and coherent across consecutive frames, our approach automatically reasons about discriminative spatial-temporal relationships among landmarks for stable face tracking. Specifically, we carefully develop an interpretable and efficient network module, which disentangles facial geometry relationship for every static frame and simultaneously enforces the bi-directional cycle-consistency across adjacent frames, thus allowing the modeling of intrinsic spatial-temporal relations from raw face sequences. Extensive experimental results demonstrate that our approach surpasses the performance of most fully supervised state-of-the-arts.