Abstract:The accuracy-speed-memory trade-off is always the priority to consider for several computer vision perception tasks. Previous methods mainly focus on a single or small couple of these tasks, such as creating effective data augmentation, feature extractor, learning strategies, etc. These approaches, however, could be inherently task-specific: their proposed model's performance may depend on a specific perception task or a dataset. Targeting to explore common learning patterns and increasing the module robustness, we propose the EffiPerception framework. It could achieve great accuracy-speed performance with relatively low memory cost under several perception tasks: 2D Object Detection, 3D Object Detection, 2D Instance Segmentation, and 3D Point Cloud Segmentation. Overall, the framework consists of three parts: (1) Efficient Feature Extractors, which extract the input features for each modality. (2) Efficient Layers, plug-in plug-out layers that further process the feature representation, aggregating core learned information while pruning noisy proposals. (3) The EffiOptim, an 8-bit optimizer to further cut down the computational cost and facilitate performance stability. Extensive experiments on the KITTI, semantic-KITTI, and COCO datasets revealed that EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks, in comparison to earlier, well-respected methods.
Abstract:Machine Learning has become a pervasive tool in climate science applications. However, current models fail to address nonstationarity induced by anthropogenic alterations in greenhouse emissions and do not routinely quantify the uncertainty of proposed projections. In this paper, we model the Atlantic Meridional Overturning Circulation (AMOC) which is of major importance to climate in Europe and the US East Coast by transporting warm water to these regions, and has the potential for abrupt collapse. We can generate arbitrarily extreme climate scenarios through arbitrary time scales which we then predict using neural networks. Our analysis shows that the AMOC is predictable using neural networks under a diverse set of climate scenarios. Further experiments reveal that MLPs and Deep Ensembles can learn the physics of the AMOC instead of imitating its progression through autocorrelation. With quantified uncertainty, an intriguing pattern of "spikes" before critical points of collapse in the AMOC casts doubt on previous analyses that predicted an AMOC collapse within this century. Our results show that Bayesian Neural Networks perform poorly compared to more dense architectures and care should be taken when applying neural networks to nonstationary scenarios such as climate projections. Further, our results highlight that big NN models might have difficulty in modeling global Earth System dynamics accurately and be successfully applied in nonstationary climate scenarios due to the physics being challenging for neural networks to capture.
Abstract:Good 3D object detection performance from LiDAR-Camera sensors demands seamless feature alignment and fusion strategies. We propose the 3DifFusionDet framework in this paper, which structures 3D object detection as a denoising diffusion process from noisy 3D boxes to target boxes. In this framework, ground truth boxes diffuse in a random distribution for training, and the model learns to reverse the noising process. During inference, the model gradually refines a set of boxes that were generated at random to the outcomes. Under the feature align strategy, the progressive refinement method could make a significant contribution to robust LiDAR-Camera fusion. The iterative refinement process could also demonstrate great adaptability by applying the framework to various detecting circumstances where varying levels of accuracy and speed are required. Extensive experiments on KITTI, a benchmark for real-world traffic object identification, revealed that 3DifFusionDet is able to perform favorably in comparison to earlier, well-respected detectors.
Abstract:The loss function is arguably among the most important hyperparameters for a neural network. Many loss functions have been designed to date, making a correct choice nontrivial. However, elaborate justifications regarding the choice of the loss function are not made in related work. This is, as we see it, an indication of a dogmatic mindset in the deep learning community which lacks empirical foundation. In this work, we consider deep neural networks in a supervised classification setting and analyze the impact the choice of loss function has onto the training result. While certain loss functions perform suboptimally, our work empirically shows that under-represented losses such as the KL Divergence can outperform the State-of-the-Art choices significantly, highlighting the need to include the loss function as a tuned hyperparameter rather than a fixed choice.