Abstract:Detecting human actions is a crucial task for autonomous robots and vehicles, often requiring the integration of various data modalities for improved accuracy. In this study, we introduce a novel approach to Human Action Recognition (HAR) based on skeleton and visual cues. Our method leverages a language model to guide the feature extraction process in the skeleton encoder. Specifically, we employ learnable prompts for the language model conditioned on the skeleton modality to optimize feature representation. Furthermore, we propose a fusion mechanism that combines dual-modality features using a salient fusion module, incorporating attention and transformer mechanisms to address the modalities' high dimensionality. This fusion process prioritizes informative video frames and body joints, enhancing the recognition accuracy of human actions. Additionally, we introduce a new dataset tailored for real-world robotic applications in construction sites, featuring visual, skeleton, and depth data modalities, named VolvoConstAct. This dataset serves to facilitate the training and evaluation of machine learning models to instruct autonomous construction machines for performing necessary tasks in the real world construction zones. To evaluate our approach, we conduct experiments on our dataset as well as three widely used public datasets, NTU-RGB+D, NTU-RGB+D120 and NW-UCLA. Results reveal that our proposed method achieves promising performance across all datasets, demonstrating its robustness and potential for various applications. The codes and dataset are available at: https://mmahdavian.github.io/ls_har/
Abstract:The increasing demand for autonomous machines in construction environments necessitates the development of robust object detection algorithms that can perform effectively across various weather and environmental conditions. This paper introduces a new semantic segmentation dataset specifically tailored for construction sites, taking into account the diverse challenges posed by adverse weather and environmental conditions. The dataset is designed to enhance the training and evaluation of object detection models, fostering their adaptability and reliability in real-world construction applications. Our dataset comprises annotated images captured under a wide range of different weather conditions, including but not limited to sunny days, rainy periods, foggy atmospheres, and low-light situations. Additionally, environmental factors such as the existence of dirt/mud on the camera lens are integrated into the dataset through actual captures and synthetic generation to simulate the complex conditions prevalent in construction sites. We also generate synthetic images of the annotations including precise semantic segmentation masks for various objects commonly found in construction environments, such as wheel loader machines, personnel, cars, and structural elements. To demonstrate the dataset's utility, we evaluate state-of-the-art object detection algorithms on our proposed benchmark. The results highlight the dataset's success in adversarial training models across diverse conditions, showcasing its efficacy compared to existing datasets that lack such environmental variability.
Abstract:Sparse Neural Networks (SNNs) can potentially demonstrate similar performance to their dense counterparts while saving significant energy and memory at inference. However, the accuracy drop incurred by SNNs, especially at high pruning ratios, can be an issue in critical deployment conditions. While recent works mitigate this issue through sophisticated pruning techniques, we shift our focus to an overlooked factor: hyperparameters and activation functions. Our analyses have shown that the accuracy drop can additionally be attributed to (i) Using ReLU as the default choice for activation functions unanimously, and (ii) Fine-tuning SNNs with the same hyperparameters as dense counterparts. Thus, we focus on learning a novel way to tune activation functions for sparse networks and combining these with a separate hyperparameter optimization (HPO) regime for sparse networks. By conducting experiments on popular DNN models (LeNet-5, VGG-16, ResNet-18, and EfficientNet-B0) trained on MNIST, CIFAR-10, and ImageNet-16 datasets, we show that the novel combination of these two approaches, dubbed Sparse Activation Function Search, short: SAFS, results in up to 15.53%, 8.88%, and 6.33% absolute improvement in the accuracy for LeNet-5, VGG-16, and ResNet-18 over the default training protocols, especially at high pruning ratios. Our code can be found at https://github.com/automl/SAFS
Abstract:This paper presents a novel machine learning framework for detecting Paroxysmal Atrial Fibrillation (PxAF), a pathological characteristic of Electrocardiogram (ECG) that can lead to fatal conditions such as heart attack. To enhance the learning process, the framework involves a Generative Adversarial Network (GAN) along with a Neural Architecture Search (NAS) in the data preparation and classifier optimization phases. The GAN is innovatively invoked to overcome the class imbalance of the training data by producing the synthetic ECG for PxAF class in a certified manner. The effect of the certified GAN is statistically validated. Instead of using a general-purpose classifier, the NAS automatically designs a highly accurate convolutional neural network architecture customized for the PxAF classification task. Experimental results show that the accuracy of the proposed framework exhibits a high value of 99% which not only enhances state-of-the-art by up to 5.1%, but also improves the classification performance of the two widely-accepted baseline methods, ResNet-18, and Auto-Sklearn, by 2.2% and 6.1%.
Abstract:The deployment of Convolutional Neural Networks (CNNs) on edge devices is hindered by the substantial gap between performance requirements and available processing power. While recent research has made large strides in developing network pruning methods for reducing the computing overhead of CNNs, there remains considerable accuracy loss, especially at high pruning ratios. Questioning that the architectures designed for non-pruned networks might not be effective for pruned networks, we propose to search architectures for pruning methods by defining a new search space and a novel search objective. To improve the generalization of the pruned networks, we propose two novel PrunedConv and PrunedLinear operations. Specifically, these operations mitigate the problem of unstable gradients by regularizing the objective function of the pruned networks. The proposed search objective enables us to train architecture parameters regarding the pruned weight elements. Quantitative analyses demonstrate that our searched architectures outperform those used in the state-of-the-art pruning networks on CIFAR-10 and ImageNet. In terms of hardware effectiveness, PR-DARTS increases MobileNet-v2's accuracy from 73.44% to 81.35% (+7.91% improvement) and runs 3.87$\times$ faster.