Abstract:Multimodal learning robust to missing modality has attracted increasing attention due to its practicality. Existing methods tend to address it by learning a common subspace representation for different modality combinations. However, we reveal that they are sub-optimal due to their implicit constraint on intra-class representation. Specifically, the sample with different modalities within the same class will be forced to learn representations in the same direction. This hinders the model from capturing modality-specific information, resulting in insufficient learning. To this end, we propose a novel Decoupled Multimodal Representation Network (DMRNet) to assist robust multimodal learning. Specifically, DMRNet models the input from different modality combinations as a probabilistic distribution instead of a fixed point in the latent space, and samples embeddings from the distribution for the prediction module to calculate the task loss. As a result, the direction constraint from the loss minimization is blocked by the sampled representation. This relaxes the constraint on the inference representation and enables the model to capture the specific information for different modality combinations. Furthermore, we introduce a hard combination regularizer to prevent DMRNet from unbalanced training by guiding it to pay more attention to hard modality combinations. Finally, extensive experiments on multimodal classification and segmentation tasks demonstrate that the proposed DMRNet outperforms the state-of-the-art significantly.
Abstract:This letter aims to provide a fundamental analytical comparison for the two major types of relaying methods: intelligent reflecting surfaces and full-duplex relays, particularly focusing on unmanned aerial vehicle communication scenarios. Both amplify-and-forward and decode-and-forward relaying schemes are included in the comparison. In addition, optimal 3D UAV deployment and minimum transmit power under the quality of service constraint are derived. Our numerical results show that IRSs of medium size exhibit comparable performance to AF relays, meanwhile outperforming DF relays under extremely large surface size and high data rates.
Abstract:Learning based on multimodal data has attracted increasing interest recently. While a variety of sensory modalities can be collected for training, not all of them are always available in development scenarios, which raises the challenge to infer with incomplete modality. To address this issue, this paper presents a one-stage modality distillation framework that unifies the privileged knowledge transfer and modality information fusion into a single optimization procedure via multi-task learning. Compared with the conventional modality distillation that performs them independently, this helps to capture the valuable representation that can assist the final model inference directly. Specifically, we propose the joint adaptation network for the modality transfer task to preserve the privileged information. This addresses the representation heterogeneity caused by input discrepancy via the joint distribution adaptation. Then, we introduce the cross translation network for the modality fusion task to aggregate the restored and available modality features. It leverages the parameters-sharing strategy to capture the cross-modal cues explicitly. Extensive experiments on RGB-D classification and segmentation tasks demonstrate the proposed multimodal inheritance framework can overcome the problem of incomplete modality input in various scenes and achieve state-of-the-art performance.
Abstract:Multimodal learning has shown great potentials in numerous scenes and attracts increasing interest recently. However, it often encounters the problem of missing modality data and thus suffers severe performance degradation in practice. To this end, we propose a general framework called MMANet to assist incomplete multimodal learning. It consists of three components: the deployment network used for inference, the teacher network transferring comprehensive multimodal information to the deployment network, and the regularization network guiding the deployment network to balance weak modality combinations. Specifically, we propose a novel margin-aware distillation (MAD) to assist the information transfer by weighing the sample contribution with the classification uncertainty. This encourages the deployment network to focus on the samples near decision boundaries and acquire the refined inter-class margin. Besides, we design a modality-aware regularization (MAR) algorithm to mine the weak modality combinations and guide the regularization network to calculate prediction loss for them. This forces the deployment network to improve its representation ability for the weak modality combinations adaptively. Finally, extensive experiments on multimodal classification and segmentation tasks demonstrate that our MMANet outperforms the state-of-the-art significantly. Code is available at: https://github.com/shicaiwei123/MMANet
Abstract:The extensive damage caused by malware requires anti-malware systems to be constantly improved to prevent new threats. The current trend in malware detection is to employ machine learning models to aid in the classification process. We propose a new dataset with the objective of improving current anti-malware systems. The focus of this dataset is to improve host based intrusion detection systems by providing API call sequences for thousands of malware samples executed in Windows 10 virtual machines. A tutorial on how to create and expand this dataset is provided along with a benchmark demonstrating how to use this dataset to classify malware. The data contains long sequences of API calls for each sample, and in order to create models that can be deployed in resource constrained devices, three feature selection methods were tested. The principal innovation, however, lies in the multi-label classification system in which one sequence of APIs can be tagged with multiple labels describing its malicious behaviours.
Abstract:Automatic modulation recognition (AMR) detects the modulation scheme of the received signals for further signal processing without needing prior information, and provides the essential function when such information is missing. Recent breakthroughs in deep learning (DL) have laid the foundation for developing high-performance DL-AMR approaches for communications systems. Comparing with traditional modulation detection methods, DL-AMR approaches have achieved promising performance including high recognition accuracy and low false alarms due to the strong feature extraction and classification abilities of deep neural networks. Despite the promising potential, DL-AMR approaches also bring concerns to complexity and explainability, which affect the practical deployment in wireless communications systems. This paper aims to present a review of the current DL-AMR research, with a focus on appropriate DL models and benchmark datasets. We further provide comprehensive experiments to compare the state of the art models for single-input-single-output (SISO) systems from both accuracy and complexity perspectives, and propose to apply DL-AMR in the new multiple-input-multiple-output (MIMO) scenario with precoding. Finally, existing challenges and possible future research directions are discussed.
Abstract:Stereo visual odometry is widely used where a robot tracks its position and orientation using stereo cameras. Most of the approaches recovered mobile robotics motion based on the matching and tracking of point features along a sequence of stereo images. But in low-textured and dynamic scenes, there are no sufficient robust static point features for motion estimation, causing lots of previous work to fail to reconstruct the robotic motion. However, line features can be detected in such low-textured and dynamic scenes. In this paper, we proposed DynPL-SVO, a stereo visual odometry with the $dynamic$ $grid$ algorithm and the cost function containing both vertical and horizontal information of the line features. Stereo camera motion was obtained through Levenberg-Marquard minimization of re-projection error of point and line features. The experimental results on the KITTI and EuRoC MAV datasets showed that the DynPL-SVO had a competitive performance when compared to other state-of-the-art systems by producing more robust and accurate motion estimation, especially in low-textured and dynamic scenes.
Abstract:Automatic modulation recognition (AMR) is a promising technology for intelligent communication receivers to detect signal modulation schemes. Recently, the emerging deep learning (DL) research has facilitated high-performance DL-AMR approaches. However, most DL-AMR models only focus on recognition accuracy, leading to huge model sizes and high computational complexity, while some lightweight and low-complexity models struggle to meet the accuracy requirements. This letter proposes an efficient DL-AMR model based on phase parameter estimation and transformation, with convolutional neural network (CNN) and gated recurrent unit (GRU) as the feature extraction layers, which can achieve high recognition accuracy equivalent to the existing state-of-the-art models but reduces more than a third of the volume of their parameters. Meanwhile, our model is more competitive in training time and test time than the benchmark models with similar recognition accuracy. Moreover, we further propose to compress our model by pruning, which maintains the recognition accuracy higher than 90% while has less than 1/8 of the number of parameters comparing with state-of-the-art models.
Abstract:Many research papers that propose models to predict the course of the COVID-19 pandemic either use handcrafted statistical models or large neural networks. Even though large neural networks are more powerful than simpler statistical models, they are especially hard to train on small datasets. This paper not only presents a model with grater flexibility than the other proposed neural networks, but also presents a model that is effective on smaller datasets. To improve performance on small data, six regularisation methods were tested. The results show that the GRU combined with 20% Dropout achieved the lowest RMSE scores. The main finding was that models with less access to data relied more on the regulariser. Applying Dropout to a GRU model trained on only 28 days of data reduced the RMSE by 23%.
Abstract:This paper proposes adaptive fractional order graph neural network (AFGNN), optimized by a time-varying fractional order gradient descent method to address the challenges of local optimum of classic and fractional GNNs which are specialised at aggregating information from the feature and adjacent matrices of connected nodes and their neighbours to solve learning tasks on non-Euclidean data such as graphs. To overcome the high computational complexity of fractional order derivations, the proposed model approximately calculates the fractional order gradients. We further prove such approximation is feasible and the AFGNN is unbiased. Extensive experiments on benchmark citation networks and object recognition challenges confirm the performance of AFGNN. The first group of experiments show that the results of AFGNN outperform the steepest gradient based method and conventional GNNs on the citation networks. The second group of experiments demonstrate that AFGNN excels at image recognition tasks where the images have a significant amount of missing pixels and expresses improved accuracy than GNNs.