Ant Group
Abstract:Malicious shell commands are linchpins to many cyber-attacks, but may not be easy to understand by security analysts due to complicated and often disguised code structures. Advances in large language models (LLMs) have unlocked the possibility of generating understandable explanations for shell commands. However, existing general-purpose LLMs suffer from a lack of expert knowledge and a tendency to hallucinate in the task of shell command explanation. In this paper, we present Raconteur, a knowledgeable, expressive and portable shell command explainer powered by LLM. Raconteur is infused with professional knowledge to provide comprehensive explanations on shell commands, including not only what the command does (i.e., behavior) but also why the command does it (i.e., purpose). To shed light on the high-level intent of the command, we also translate the natural-language-based explanation into standard technique & tactic defined by MITRE ATT&CK, the worldwide knowledge base of cybersecurity. To enable Raconteur to explain unseen private commands, we further develop a documentation retriever to obtain relevant information from complementary documentations to assist the explanation process. We have created a large-scale dataset for training and conducted extensive experiments to evaluate the capability of Raconteur in shell command explanation. The experiments verify that Raconteur is able to provide high-quality explanations and in-depth insight of the intent of the command.
Abstract:Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EfficientASR. It uses an Index Mapping Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EfficientASR achieves competitive results on the AISHELL-1 and AISHELL-2 benchmarks compared to the state-of-the-art (SOTA) models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the SOTA AR Conformer with about 30x inference speedup.
Abstract:Currently, sample-specific backdoor attacks (SSBAs) are the most advanced and malicious methods since they can easily circumvent most of the current backdoor defenses. In this paper, we reveal that SSBAs are not sufficiently stealthy due to their poisoned-label nature, where users can discover anomalies if they check the image-label relationship. In particular, we demonstrate that it is ineffective to directly generalize existing SSBAs to their clean-label variants by poisoning samples solely from the target class. We reveal that it is primarily due to two reasons, including \textbf{(1)} the `antagonistic effects' of ground-truth features and \textbf{(2)} the learning difficulty of sample-specific features. Accordingly, trigger-related features of existing SSBAs cannot be effectively learned under the clean-label setting due to their mild trigger intensity required for ensuring stealthiness. We argue that the intensity constraint of existing SSBAs is mostly because their trigger patterns are `content-irrelevant' and therefore act as `noises' for both humans and DNNs. Motivated by this understanding, we propose to exploit content-relevant features, $a.k.a.$ (human-relied) attributes, as the trigger patterns to design clean-label SSBAs. This new attack paradigm is dubbed backdoor attack with attribute trigger (BAAT). Extensive experiments are conducted on benchmark datasets, which verify the effectiveness of our BAAT and its resistance to existing defenses.
Abstract:Data augmentation is vital to the generalization ability and robustness of deep neural networks (DNNs) models. Existing augmentation methods for speaker verification manipulate the raw signal, which are time-consuming and the augmented samples lack diversity. In this paper, we present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification, which can generate diversified training samples in speaker embedding space with negligible extra computing cost. Firstly, we augment training samples by perturbing speaker embeddings along semantic directions, which are obtained from speaker-wise covariance matrices. Secondly, accurate covariance matrices are estimated from robust speaker embeddings during training, so we introduce difficultyaware additive margin softmax (DAAM-Softmax) to obtain optimal speaker embeddings. Finally, we assume the number of augmented samples goes to infinity and derive a closed-form upper bound of the expected loss with DASA, which achieves compatibility and efficiency. Extensive experiments demonstrate the proposed approach can achieve a remarkable performance improvement. The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
Abstract:Convolutional neural networks (ConvNets) have been successfully applied to satellite image scene classification. Human-labeled training datasets are essential for ConvNets to perform accurate classification. Errors in human-labeled training datasets are unavoidable due to the complexity of satellite images. However, the distribution of human labeling errors on satellite images and their impact on ConvNets have not been investigated. To fill this research gap, this study, for the first time, collected real-world labels from 32 participants and explored how their errors affect three ConvNets (VGG16, GoogleNet and ResNet-50) for high-resolution satellite image scene classification. We found that: (1) human labeling errors have significant class and instance dependence, which is fundamentally different from the simulation noise in previous studies; (2) regarding the overall accuracy of all classes, when human labeling errors in training data increase by one unit, the overall accuracy of ConvNets classification decreases by approximately half a unit; (3) regarding the accuracy of each class, the impact of human labeling errors on ConvNets shows large heterogeneity across classes. To uncover the mechanism underlying the impact of human labeling errors on ConvNets, we further compared it with two types of simulated labeling noise: uniform noise (errors independent of both classes and instances) and class-dependent noise (errors independent of instances but not classes). Our results show that the impact of human labeling errors on ConvNets is similar to that of the simulated class-dependent noise but not to that of the simulated uniform noise, suggesting that the impact of human labeling errors on ConvNets is mainly due to class-dependent errors rather than instance-dependent errors.
Abstract:Prioritizing fairness is of central importance in artificial intelligence (AI) systems, especially for those societal applications, e.g., hiring systems should recommend applicants equally from different demographic groups, and risk assessment systems must eliminate racism in criminal justice. Existing efforts towards the ethical development of AI systems have leveraged data science to mitigate biases in the training set or introduced fairness principles into the training process. For a deployed AI system, however, it may not allow for retraining or tuning in practice. By contrast, we propose a more flexible approach, i.e., fairness-aware adversarial perturbation (FAAP), which learns to perturb input data to blind deployed models on fairness-related features, e.g., gender and ethnicity. The key advantage is that FAAP does not modify deployed models in terms of parameters and structures. To achieve this, we design a discriminator to distinguish fairness-related attributes based on latent representations from deployed models. Meanwhile, a perturbation generator is trained against the discriminator, such that no fairness-related features could be extracted from perturbed inputs. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed FAAP. In addition, FAAP is validated on real-world commercial deployments (inaccessible to model parameters), which shows the transferability of FAAP, foreseeing the potential of black-box adaptation.
Abstract:Time-domain reflectometry (TDR) is an established means of measuring impedance inhomogeneity of a variety of waveguides, providing critical data necessary to characterize and optimize the performance of high-bandwidth computational and communication systems. However, TDR systems with both the high spatial resolution (sub-cm) and voltage resolution (sub-$\muV$) required to evaluate high-performance waveguides are physically large and often cost-prohibitive, severely limiting their utility as testing platforms and greatly limiting their use in characterizing and trouble-shooting fielded hardware. Consequently, there exists a growing technical need for an electronically simple, portable, and low-cost TDR technology. The receiver of a TDR system plays a key role in recording reflection waveforms; thus, such a receiver must have high analog bandwidth, high sampling rate, and high-voltage resolution. However, these requirements are difficult to meet using low-cost analog-to-digital converters (ADCs). This article describes a new TDR architecture, namely, jitter-based APC (JAPC), which obviates the need for external components based on an alternative concept, analog-to-probability conversion (APC) that was recently proposed. These results demonstrate that a fully reconfigurable and highly integrated TDR (iTDR) can be implemented on a field-programmable gate array (FPGA) chip without using any external circuit components. Empirical evaluation of the system was conducted using an HDMI cable as the device under test (DUT), and the resulting impedance inhomogeneity pattern (IIP) of the DUT was extracted with spatial and voltage resolutions of 5 cm and 80 $\muV$, respectively. These results demonstrate the feasibility of using the prototypical JAPC-based iTDR for real-world waveguide characterization applications
Abstract:The task of classifying mammograms is very challenging because the lesion is usually small in the high resolution image. The current state-of-the-art approaches for medical image classification rely on using the de-facto method for ConvNets - fine-tuning. However, there are fundamental differences between natural images and medical images, which based on existing evidence from the literature, limits the overall performance gain when designed with algorithmic approaches. In this paper, we propose to go beyond fine-tuning by introducing a novel framework called MorphHR, in which we highlight a new transfer learning scheme. The idea behind the proposed framework is to integrate function-preserving transformations, for any continuous non-linear activation neurons, to internally regularise the network for improving mammograms classification. The proposed solution offers two major advantages over the existing techniques. Firstly and unlike fine-tuning, the proposed approach allows for modifying not only the last few layers but also several of the first ones on a deep ConvNet. By doing this, we can design the network front to be suitable for learning domain specific features. Secondly, the proposed scheme is scalable to hardware. Therefore, one can fit high resolution images on standard GPU memory. We show that by using high resolution images, one prevents losing relevant information. We demonstrate, through numerical and visual experiments, that the proposed approach yields to a significant improvement in the classification performance over state-of-the-art techniques, and is indeed on a par with radiology experts. Moreover and for generalisation purposes, we show the effectiveness of the proposed learning scheme on another large dataset, the ChestX-ray14, surpassing current state-of-the-art techniques.
Abstract:Though deep neural networks have achieved the state of the art performance in visual classification, recent studies have shown that they are all vulnerable to the attack of adversarial examples. In this paper, we develop improved techniques for defending against adversarial examples.First, we introduce enhanced defense using a technique we call \textbf{Attention and Adversarial Logit Pairing(AT+ALP)}, a method that encourages both attention map and logit for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, \textbf{AT+ALP} improves accuracy on adversarial examples over adversarial training.Next,We show that our \textbf{AT+ALP} can effectively increase the average activations of adversarial examples in the key area and demonstrate that it focuse on more discriminate features to improve the robustness of the model.Finally,we conducte extensive experiments using a wide range of datasets and the experiment results show that our \textbf{AT+ALP} achieves \textbf{the state of the art} defense.For example,on \textbf{17 Flower Category Database}, under strong 200-iteration \textbf{PGD} gray-box and black-box attacks where prior art has 34\% and 39\% accuracy, our method achieves \textbf{50\%} and \textbf{51\%}.Compared with previous work,our work is evaluated under highly challenging PGD attack:the maximum perturbation $\epsilon \in \{0.25,0.5\}$ i.e. $L_\infty \in \{0.25,0.5\}$ with 10 to 200 attack iterations.To our knowledge, such a strong attack has not been previously explored on a wide range of datasets.
Abstract:Many recent works demonstrated that Deep Learning models are vulnerable to adversarial examples.Fortunately, generating adversarial examples usually requires white-box access to the victim model, and the attacker can only access the APIs opened by cloud platforms. Thus, keeping models in the cloud can usually give a (false) sense of security.Unfortunately, cloud-based image classification service is not robust to simple transformations such as Gaussian Noise, Salt-and-Pepper Noise, Rotation and Monochromatization. In this paper,(1) we propose one novel attack method called Image Fusion(IF) attack, which achieve a high bypass rate,can be implemented only with OpenCV and is difficult to defend; and (2) we make the first attempt to conduct an extensive empirical study of Simple Transformation (ST) attacks against real-world cloud-based classification services. Through evaluations on four popular cloud platforms including Amazon, Google, Microsoft, Clarifai, we demonstrate that ST attack has a success rate of approximately 100% except Amazon approximately 50%, IF attack have a success rate over 98% among different classification services. (3) We discuss the possible defenses to address these security challenges.Experiments show that our defense technology can effectively defend known ST attacks.