Abstract:Age estimation from a single face image has been an essential task in the field of human-computer interaction and computer vision which has a wide range of practical application value. Concerning the problem that accuracy of age estimation of face images in the wild are relatively low for existing methods, where they take into account only the whole features of face image while neglecting the fine-grained features of age-sensitive area, we propose a method based on Attention LSTM network for Fine-Grained age estimation in the wild based on the idea of Fine-Grained categories and visual attention mechanism. This method combines ResNets or RoR models with LSTM unit to construct AL-ResNets or AL-RoR networks to extract age-sensitive local regions, which effectively improves age estimation accuracy. Firstly, ResNets or RoR model pre-trained on ImageNet dataset is selected as the basic model, which is then fine-tuned on the IMDB-WIKI-101 dataset for age estimation. Then, we fine-tune ResNets or RoR on the target age datasets to extract the global features of face images. To extract the local characteristics of age-sensitive areas, the LSTM unit is then presented to obtain the coordinates of the age-sensitive region automatically. Finally, the age group classification experiment is conducted directly on the Adience dataset, and age-regression experiments are performed by the Deep EXpectation algorithm (DEX) on MORPH Album 2, FG-NET and LAP datasets. By combining the global and local features, we got our final prediction results. Our experiments illustrate the effectiveness of AL-ResNets or AL-RoR for age estimation in the wild, where it achieves new state-of-the-art performance than all other CNN methods on the Adience, MORPH Album 2, FG-NET and LAP datasets.
Abstract:Automatically predicting age group and gender from face images acquired in unconstrained conditions is an important and challenging task in many real-world applications. Nevertheless, the conventional methods with manually-designed features on in-the-wild benchmarks are unsatisfactory because of incompetency to tackle large variations in unconstrained images. This difficulty is alleviated to some degree through Convolutional Neural Networks (CNN) for its powerful feature representation. In this paper, we propose a new CNN based method for age group and gender estimation leveraging Residual Networks of Residual Networks (RoR), which exhibits better optimization ability for age group and gender classification than other CNN architectures.Moreover, two modest mechanisms based on observation of the characteristics of age group are presented to further improve the performance of age estimation.In order to further improve the performance and alleviate over-fitting problem, RoR model is pre-trained on ImageNet firstly, and then it is fune-tuned on the IMDB-WIKI-101 data set for further learning the features of face images, finally, it is used to fine-tune on Adience data set. Our experiments illustrate the effectiveness of RoR method for age and gender estimation in the wild, where it achieves better performance than other CNN methods. Finally, the RoR-152+IMDB-WIKI-101 with two mechanisms achieves new state-of-the-art results on Adience benchmark.
Abstract:A residual-networks family with hundreds or even thousands of layers dominates major image recognition tasks, but building a network by simply stacking residual blocks inevitably limits its optimization ability. This paper proposes a novel residual-network architecture, Residual networks of Residual networks (RoR), to dig the optimization ability of residual networks. RoR substitutes optimizing residual mapping of residual mapping for optimizing original residual mapping. In particular, RoR adds level-wise shortcut connections upon original residual networks to promote the learning capability of residual networks. More importantly, RoR can be applied to various kinds of residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their performance. Our experiments demonstrate the effectiveness and versatility of RoR, where it achieves the best performance in all residual-network-like structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%, respectively. RoR-3 models also achieve state-of-the-art results compared to ResNets on ImageNet data set.