Abstract:This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks, a first of its kind in the field. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection, addressing the significant challenges posed by overfitting and limited training data in these domains. Our work categorizes data augmentation methods into two main types: data generation and data perturbation. Data generation covers techniques like graphic engine-based generation, generative model-based generation, and data recombination, while data perturbation is divided into image-level and human-level perturbations. Each method is tailored to the unique requirements of human-centric tasks, with some applicable across multiple areas. Our contributions include an extensive literature review, providing deep insights into the influence of these augmentation techniques in human-centric vision and highlighting the nuances of each method. We also discuss open issues and future directions, such as the integration of advanced generative models like Latent Diffusion Models, for creating more realistic and diverse training data. This survey not only encapsulates the current state of data augmentation in human-centric vision but also charts a course for future research, aiming to develop more robust, accurate, and efficient human-centric vision systems.
Abstract:Recent years have witnessed unprecedented amounts of data generated by telecommunication (Telco) cellular networks. For example, measurement records (MRs) are generated to report the connection states between mobile devices and Telco networks, e.g., received signal strength. MR data have been widely used to localize outdoor mobile devices for human mobility analysis, urban planning, and traffic forecasting. Existing works using first-order sequence models such as the Hidden Markov Model (HMM) attempt to capture spatio-temporal locality in underlying mobility patterns for lower localization errors. The HMM approaches typically assume stable mobility patterns of the underlying mobile devices. Yet real MR datasets exhibit heterogeneous mobility patterns due to mixed transportation modes of the underlying mobile devices and uneven distribution of the positions associated with MR samples. Thus, the existing solutions cannot handle these heterogeneous mobility patterns. we propose a multi-task learning-based deep neural network (DNN) framework, namely PRNet+, to incorporate outdoor position recovery and transportation mode detection. To make sure the framework work, PRNet+ develops a feature extraction module to precisely learn local-, short- and long-term spatio-temporal locality from heterogeneous MR samples. Extensive evaluation on eight datasets collected at three representative areas in Shanghai indicates that PRNet+ greatly outperforms state-of-the-arts.
Abstract:Telecommunication (Telco) outdoor position recovery aims to localize outdoor mobile devices by leveraging measurement report (MR) data. Unfortunately, Telco position recovery requires sufficient amount of MR samples across different areas and suffers from high data collection cost. For an area with scarce MR samples, it is hard to achieve good accuracy. In this paper, by leveraging the recently developed transfer learning techniques, we design a novel Telco position recovery framework, called TLoc, to transfer good models in the carefully selected source domains (those fine-grained small subareas) to a target one which originally suffers from poor localization accuracy. Specifically, TLoc introduces three dedicated components: 1) a new coordinate space to divide an area of interest into smaller domains, 2) a similarity measurement to select best source domains, and 3) an adaptation of an existing transfer learning approach. To the best of our knowledge, TLoc is the first framework that demonstrates the efficacy of applying transfer learning in the Telco outdoor position recovery. To exemplify, on the 2G GSM and 4G LTE MR datasets in Shanghai, TLoc outperforms a nontransfer approach by 27.58% and 26.12% less median errors, and further leads to 47.77% and 49.22% less median errors than a recent fingerprinting approach NBL.