Abstract:Few-shot action recognition (FSAR) aims to learn a model capable of identifying novel actions in videos using only a few examples. In assuming the base dataset seen during meta-training and novel dataset used for evaluation can come from different domains, cross-domain few-shot learning alleviates data collection and annotation costs required by methods with greater supervision and conventional (single-domain) few-shot methods. While this form of learning has been extensively studied for image classification, studies in cross-domain FSAR (CD-FSAR) are limited to proposing a model, rather than first understanding the cross-domain capabilities of existing models. To this end, we systematically evaluate existing state-of-the-art single-domain, transfer-based, and cross-domain FSAR methods on new cross-domain tasks with increasing difficulty, measured based on the domain shift between the base and novel set. Our empirical meta-analysis reveals a correlation between domain difference and downstream few-shot performance, and uncovers several important insights into which model aspects are effective for CD-FSAR and which need further development. Namely, we find that as the domain difference increases, the simple transfer-learning approach outperforms other methods by over 12 percentage points, and under these more challenging cross-domain settings, the specialised cross-domain model achieves the lowest performance. We also witness state-of-the-art single-domain FSAR models which use temporal alignment achieving similar or worse performance than earlier methods which do not, suggesting existing temporal alignment techniques fail to generalise on unseen domains. To the best of our knowledge, we are the first to systematically study the CD-FSAR problem in-depth. We hope the insights and challenges revealed in our study inspires and informs future work in these directions.
Abstract:Accurate and efficient object detection is crucial for safe and efficient operation of earth-moving equipment in mining. Traditional 2D image-based methods face limitations in dynamic and complex mine environments. To overcome these challenges, 3D object detection using point cloud data has emerged as a comprehensive approach. However, training models for mining scenarios is challenging due to sensor height variations, viewpoint changes, and the need for diverse annotated datasets. This paper presents novel contributions to address these challenges. We introduce a synthetic dataset SimMining 3D [1] specifically designed for 3D object detection in mining environments. The dataset captures objects and sensors positioned at various heights within mine benches, accurately reflecting authentic mining scenarios. An automatic annotation pipeline through ROS interface reduces manual labor and accelerates dataset creation. We propose evaluation metrics accounting for sensor-to-object height variations and point cloud density, enabling accurate model assessment in mining scenarios. Real data tests validate our models effectiveness in object prediction. Our ablation study emphasizes the importance of altitude and height variation augmentations in improving accuracy and reliability. The publicly accessible synthetic dataset [1] serves as a benchmark for supervised learning and advances object detection techniques in mining with complimentary pointwise annotations for each scene. In conclusion, our work bridges the gap between synthetic and real data, addressing the domain shift challenge in 3D object detection for mining. We envision robust object detection systems enhancing safety and efficiency in mining and related domains.
Abstract:This paper illustrates an application of machine learning (ML) within a complex system that performs grade estimation. In surface mining, assay measurements taken from production drilling often provide rich information that allows initially inaccurate surfaces created using sparse exploration data to be revised and subsequently improved. Recently, a Bayesian warping technique has been proposed to reshape modeled surfaces based on geochemical and spatial constraints imposed by newly acquired blasthole data. This paper focuses on incorporating machine learning in this warping framework to make the likelihood computation generalizable. The technique works by adjusting the position of vertices on the surface to maximize the integrity of modeled geological boundaries with respect to sparse geochemical observations. Its foundation is laid by a Bayesian derivation in which the geological domain likelihood given the chemistry, p(g|c), plays a similar role to p(y(c)|g). This observation allows a manually calibrated process centered around the latter to be automated since ML techniques may be used to estimate the former in a data-driven way. Machine learning performance is evaluated for gradient boosting, neural network, random forest and other classifiers in a binary and multi-class context using precision and recall rates. Once ML likelihood estimators are integrated in the surface warping framework, surface shaping performance is evaluated using unseen data by examining the categorical distribution of test samples located above and below the warped surface. Large-scale validation experiments are performed to assess the overall efficacy of ML assisted surface warping as a fully integrated component within an ore grade estimation system where the posterior mean is obtained via Gaussian Process inference with a Matern 3/2 kernel.