Tandon School of Engineering, New York University, New York, USA
Abstract:Visual bird's eye view (BEV) semantic segmentation helps autonomous vehicles understand the surrounding environment only from images, including static elements (e.g., roads) and dynamic elements (e.g., vehicles, pedestrians). However, the high cost of annotation procedures of full-supervised methods limits the capability of the visual BEV semantic segmentation, which usually needs HD maps, 3D object bounding boxes, and camera extrinsic matrixes. In this paper, we present a novel semi-supervised framework for visual BEV semantic segmentation to boost performance by exploiting unlabeled images during the training. A consistency loss that makes full use of unlabeled data is then proposed to constrain the model on not only semantic prediction but also the BEV feature. Furthermore, we propose a novel and effective data augmentation method named conjoint rotation which reasonably augments the dataset while maintaining the geometric relationship between the front-view images and the BEV semantic segmentation. Extensive experiments on the nuScenes and Argoverse datasets show that our semi-supervised framework can effectively improve prediction accuracy. To the best of our knowledge, this is the first work that explores improving visual BEV semantic segmentation performance using unlabeled data. The code will be publicly available.
Abstract:Well-calibrated traffic flow models are fundamental to understanding traffic phenomena and designing control strategies. Traditional calibration has been developed base on optimization methods. In this paper, we propose a novel physics-informed, learning-based calibration approach that achieves performances comparable to and even better than those of optimization-based methods. To this end, we combine the classical deep autoencoder, an unsupervised machine learning model consisting of one encoder and one decoder, with traffic flow models. Our approach informs the decoder of the physical traffic flow models and thus induces the encoder to yield reasonable traffic parameters given flow and speed measurements. We also introduce the denoising autoencoder into our method so that it can handles not only with normal data but also with corrupted data with missing values. We verified our approach with a case study of I-210 E in California.
Abstract:This paper presents a generalization of conventional sliding mode control designs for systems in Euclidean spaces to fully actuated simple mechanical systems whose configuration space is a Lie group for the trajectory-tracking problem. A generic kinematic control is first devised in the underlying Lie algebra, which enables the construction of a Lie group on the tangent bundle where the system state evolves. A sliding subgroup is then proposed on the tangent bundle with the desired sliding properties, and a control law is designed for the error dynamics trajectories to reach the sliding subgroup globally exponentially. Tracking control is then composed of the reaching law and sliding mode, and is applied for attitude tracking on the special orthogonal group SO(3) and the unit sphere S3. Numerical simulations show the performance of the proposed geometric sliding-mode controller (GSMC) in contrast with two control schemes of the literature.
Abstract:Airports have an important role in both military and civilian domains. The synthetic aperture radar (SAR) based airport detection has received increasing attention in recent years. However, due to the high cost of SAR imaging and annotation process, there is no publicly available SAR dataset for airport detection. As a result, deep learning methods have not been fully used in airport detection tasks. To provide a benchmark for airport detection research in SAR images, this paper introduces a large-scale SAR Airport Dataset (SAD). In order to adequately reflect the demands of real world applications, it contains 624 SAR images from Sentinel 1B and covers 104 airfield instances with different scales, orientations and shapes. The experiments of multiple deep learning approach on this dataset proves its effectiveness. It developing state-of-the-art airport area detection algorithms or other relevant tasks.
Abstract:The further development of deep neural networks is hampered by the limited GPU memory resource. Therefore, the optimization of GPU memory resources is highly demanded. Swapping and recomputation are commonly applied to make better use of GPU memory in deep learning. However, as an emerging domain, several challenges remain:1)The efficiency of recomputation is limited for both static and dynamic methods. 2)Swapping requires offloading parameters manually, which incurs a great time cost. 3) There is no such dynamic and fine-grained method that involves tensor swapping together with tensor recomputation nowadays. To remedy the above issues, we propose a novel scheduler manager named DELTA(Dynamic tEnsor offLoad and recompuTAtion). To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation without user oversight. In DELTA, we propose a filter algorithm to select the optimal tensors to be released out of GPU memory and present a director algorithm to select a proper action for each of these tensors. Furthermore, prefetching and overlapping are deliberately considered to overcome the time cost caused by swapping and recomputing tensors. Experimental results show that DELTA not only saves 40%-70% of GPU memory, surpassing the state-of-the-art method to a great extent but also gets comparable convergence results as the baseline with acceptable time delay. Also, DELTA gains 2.04$\times$ maximum batchsize when training ResNet-50 and 2.25$\times$ when training ResNet-101 compared with the baseline. Besides, comparisons between the swapping cost and recomputation cost in our experiments demonstrate the importance of making a reasonable dynamic scheduler on tensor swapping and tensor recomputation, which refutes the arguments in some related work that swapping should be the first and best choice.
Abstract:Ramp metering that uses traffic signals to regulate vehicle flows from the on-ramps has been widely implemented to improve vehicle mobility of the freeway. Previous studies generally update signal timings in real-time based on predefined traffic measures collected by point detectors, such as traffic volumes and occupancies. Comparing with point detectors, traffic cameras-which have been increasingly deployed on road networks-could cover larger areas and provide more detailed traffic information. In this work, we propose a deep reinforcement learning (DRL) method to explore the potential of traffic video data in improving the efficiency of ramp metering. The proposed method uses traffic video frames as inputs and learns the optimal control strategies directly from the high-dimensional visual inputs. A real-world case study demonstrates that, in comparison with a state-of-the-practice method, the proposed DRL method results in 1) lower travel times in the mainline, 2) shorter vehicle queues at the on-ramp, and 3) higher traffic flows downstream of the merging area. The results suggest that the proposed method is able to extract useful information from the video data for better ramp metering controls.
Abstract:It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulty in the training phase. With the gradient-free feature and immunity to poor conditions, the Alternating Direction Method of Multipliers (ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms. However, ADMM could not be applied to train RNN directly since the state in the recurrent unit is repetitively updated over timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously and provides novel update rules and theoretical convergence analysis. We explicitly specify key update rules in the iterations of ADMMiRNN with deliberately constructed approximation techniques and solutions to each subproblem instead of vanilla ADMM. Numerical experiments are conducted on MNIST and text classification tasks, where ADMMiRNN achieves convergent results and outperforms compared baselines. Furthermore, ADMMiRNN trains RNN in a more stable way without gradient vanishing or exploding compared to the stochastic gradient algorithms. Source code has been available at https://github.com/TonyTangYu/ADMMiRNN.
Abstract:Median regression analysis has robustness properties which make it attractive compared with regression based on the mean, while differential privacy can protect individual privacy during statistical analysis of certain datasets. In this paper, three privacy preserving methods are proposed for median regression. The first algorithm is based on a finite smoothing method, the second provides an iterative way and the last one further employs the greedy coordinate descent approach. Privacy preserving properties of these three methods are all proved. Accuracy bound or convergence properties of these algorithms are also provided. Numerical calculation shows that the first method has better accuracy than the others when the sample size is small. When the sample size becomes larger, the first method needs more time while the second method needs less time with well-matched accuracy. For the third method, it costs less time in both cases, while it highly depends on step size.