Abstract:Integrating supervised contrastive loss to cross entropy-based communication has recently been proposed as a solution to address the long-tail learning problem. However, when the class imbalance ratio is high, it requires adjusting the supervised contrastive loss to support the tail classes, as the conventional contrastive learning is biased towards head classes by default. To this end, we present Rebalanced Contrastive Learning (RCL), an efficient means to increase the long tail classification accuracy by addressing three main aspects: 1. Feature space balancedness - Equal division of the feature space among all the classes, 2. Intra-Class compactness - Reducing the distance between same-class embeddings, 3. Regularization - Enforcing larger margins for tail classes to reduce overfitting. RCL adopts class frequency-based SoftMax loss balancing to supervised contrastive learning loss and exploits scalar multiplied features fed to the contrastive learning loss to enforce compactness. We implement RCL on the Balanced Contrastive Learning (BCL) Framework, which has the SOTA performance. Our experiments on three benchmark datasets demonstrate the richness of the learnt embeddings and increased top-1 balanced accuracy RCL provides to the BCL framework. We further demonstrate that the performance of RCL as a standalone loss also achieves state-of-the-art level accuracy.
Abstract:Robots typically possess sensors of different modalities, such as colour cameras, inertial measurement units, and 3D laser scanners. Often, solving a particular problem becomes easier when more than one modality is used. However, while there are undeniable benefits to combine sensors of different modalities the process tends to be complicated. Segmenting scenes observed by the robot into a discrete set of classes is a central requirement for autonomy as understanding the scene is the first step to reason about future situations. Scene segmentation is commonly performed using either image data or 3D point cloud data. In computer vision many successful methods for scene segmentation are based on conditional random fields (CRF) where the maximum a posteriori (MAP) solution to the segmentation can be obtained by inference. In this paper we devise a new CRF inference method for scene segmentation that incorporates global constraints, enforcing the sets of nodes are assigned the same class label. To do this efficiently, the CRF is formulated as a relaxed quadratic program whose MAP solution is found using a gradient-based optimisation approach. The proposed method is evaluated on images and 3D point cloud data gathered in urban environments where image data provides the appearance features needed by the CRF, while the 3D point cloud data provides global spatial constraints over sets of nodes. Comparisons with belief propagation, conventional quadratic programming relaxation, and higher order potential CRF show the benefits of the proposed method.