Abstract:Scanning for trojan (backdoor) in deep neural networks is crucial due to their significant real-world applications. There has been an increasing focus on developing effective general trojan scanning methods across various trojan attacks. Despite advancements, there remains a shortage of methods that perform effectively without preconceived assumptions about the backdoor attack method. Additionally, we have observed that current methods struggle to identify classifiers trojaned using adversarial training. Motivated by these challenges, our study introduces a novel scanning method named TRODO (TROjan scanning by Detection of adversarial shifts in Out-of-distribution samples). TRODO leverages the concept of "blind spots"--regions where trojaned classifiers erroneously identify out-of-distribution (OOD) samples as in-distribution (ID). We scan for these blind spots by adversarially shifting OOD samples towards in-distribution. The increased likelihood of perturbed OOD samples being classified as ID serves as a signature for trojan detection. TRODO is both trojan and label mapping agnostic, effective even against adversarially trained trojaned classifiers. It is applicable even in scenarios where training data is absent, demonstrating high accuracy and adaptability across various scenarios and datasets, highlighting its potential as a robust trojan scanning strategy.
Abstract:There have been several efforts to improve Novelty Detection (ND) performance. However, ND methods often suffer significant performance drops under minor distribution shifts caused by changes in the environment, known as style shifts. This challenge arises from the ND setup, where the absence of out-of-distribution (OOD) samples during training causes the detector to be biased toward the dominant style features in the in-distribution (ID) data. As a result, the model mistakenly learns to correlate style with core features, using this shortcut for detection. Robust ND is crucial for real-world applications like autonomous driving and medical imaging, where test samples may have different styles than the training data. Motivated by this, we propose a robust ND method that crafts an auxiliary OOD set with style features similar to the ID set but with different core features. Then, a task-based knowledge distillation strategy is utilized to distinguish core features from style features and help our model rely on core features for discriminating crafted OOD and ID sets. We verified the effectiveness of our method through extensive experimental evaluations on several datasets, including synthetic and real-world benchmarks, against nine different ND methods.
Abstract:In recent years, there have been significant improvements in various forms of image outlier detection. However, outlier detection performance under adversarial settings lags far behind that in standard settings. This is due to the lack of effective exposure to adversarial scenarios during training, especially on unseen outliers, leading to detection models failing to learn robust features. To bridge this gap, we introduce RODEO, a data-centric approach that generates effective outliers for robust outlier detection. More specifically, we show that incorporating outlier exposure (OE) and adversarial training can be an effective strategy for this purpose, as long as the exposed training outliers meet certain characteristics, including diversity, and both conceptual differentiability and analogy to the inlier samples. We leverage a text-to-image model to achieve this goal. We demonstrate both quantitatively and qualitatively that our adaptive OE method effectively generates ``diverse'' and ``near-distribution'' outliers, leveraging information from both text and image domains. Moreover, our experimental results show that utilizing our synthesized outliers significantly enhances the performance of the outlier detector, particularly in adversarial settings.
Abstract:Despite significant progress in Anomaly Detection (AD), the robustness of existing detection methods against adversarial attacks remains a challenge, compromising their reliability in critical real-world applications such as autonomous driving. This issue primarily arises from the AD setup, which assumes that training data is limited to a group of unlabeled normal samples, making the detectors vulnerable to adversarial anomaly samples during testing. Additionally, implementing adversarial training as a safeguard encounters difficulties, such as formulating an effective objective function without access to labels. An ideal objective function for adversarial training in AD should promote strong perturbations both within and between the normal and anomaly groups to maximize margin between normal and anomaly distribution. To address these issues, we first propose crafting a pseudo-anomaly group derived from normal group samples. Then, we demonstrate that adversarial training with contrastive loss could serve as an ideal objective function, as it creates both inter- and intra-group perturbations. However, we notice that spurious negative pairs compromise the conventional contrastive loss to achieve robust AD. Spurious negative pairs are those that should be closely mapped but are erroneously separated. These pairs introduce noise and misguide the direction of inter-group adversarial perturbations. To overcome the effect of spurious negative pairs, we define opposite pairs and adversarially pull them apart to strengthen inter-group perturbations. Experimental results demonstrate our superior performance in both clean and adversarial scenarios, with a 26.1% improvement in robust detection across various challenging benchmark datasets. The implementation of our work is available at: https://github.com/rohban-lab/COBRA.
Abstract:Novelty Detection (ND) plays a crucial role in machine learning by identifying new or unseen data during model inference. This capability is especially important for the safe and reliable operation of automated systems. Despite advances in this field, existing techniques often fail to maintain their performance when subject to adversarial attacks. Our research addresses this gap by marrying the merits of nearest-neighbor algorithms with robust features obtained from models pretrained on ImageNet. We focus on enhancing the robustness and performance of ND algorithms. Experimental results demonstrate that our approach significantly outperforms current state-of-the-art methods across various benchmarks, particularly under adversarial conditions. By incorporating robust pretrained features into the k-NN algorithm, we establish a new standard for performance and robustness in the field of robust ND. This work opens up new avenues for research aimed at fortifying machine learning systems against adversarial vulnerabilities. Our implementation is publicly available at https://github.com/rohban-lab/ZARND.
Abstract:Text-to-image (T2I) generative models, such as Stable Diffusion and DALL-E, have shown remarkable proficiency in producing high-quality, realistic, and natural images from textual descriptions. However, these models sometimes fail to accurately capture all the details specified in the input prompts, particularly concerning entities, attributes, and spatial relationships. This issue becomes more pronounced when the prompt contains novel or complex compositions, leading to what are known as compositional generation failure modes. Recently, a new open-source diffusion-based T2I model, FLUX, has been introduced, demonstrating strong performance in high-quality image generation. Additionally, autoregressive T2I models like LlamaGen have claimed competitive visual quality performance compared to diffusion-based models. In this study, we evaluate the compositional generation capabilities of these newly introduced models against established models using the T2I-CompBench benchmark. Our findings reveal that LlamaGen, as a vanilla autoregressive model, is not yet on par with state-of-the-art diffusion models for compositional generation tasks under the same criteria, such as model size and inference time. On the other hand, the open-source diffusion-based model FLUX exhibits compositional generation capabilities comparable to the state-of-the-art closed-source model DALL-E3.
Abstract:Text-to-image diffusion models, such as Stable Diffusion and DALL-E, are capable of generating high-quality, diverse, and realistic images from textual prompts. However, they sometimes struggle to accurately depict specific entities described in prompts, a limitation known as the entity missing problem in compositional generation. While prior studies suggested that adjusting cross-attention maps during the denoising process could alleviate this problem, they did not systematically investigate which objective functions could best address it. This study examines three potential causes of the entity-missing problem, focusing on cross-attention dynamics: (1) insufficient attention intensity for certain entities, (2) overly broad attention spread, and (3) excessive overlap between attention maps of different entities. We found that reducing overlap in attention maps between entities can effectively minimize the rate of entity missing. Specifically, we hypothesize that tokens related to specific entities compete for attention on certain image regions during the denoising process, which can lead to divided attention across tokens and prevent accurate representation of each entity. To address this issue, we introduced four loss functions, Intersection over Union (IoU), center-of-mass (CoM) distance, Kullback-Leibler (KL) divergence, and clustering compactness (CC) to regulate attention overlap during denoising steps without the need for retraining. Experimental results across a wide variety of benchmarks reveal that these proposed training-free methods significantly improve compositional accuracy, outperforming previous approaches in visual question answering (VQA), captioning scores, CLIP similarity, and human evaluations. Notably, these methods improved human evaluation scores by 9% over the best baseline, demonstrating substantial improvements in compositional alignment.
Abstract:Classifiers trained with Empirical Risk Minimization (ERM) tend to rely on attributes that have high spurious correlation with the target. This can degrade the performance on underrepresented (or 'minority') groups that lack these attributes, posing significant challenges for both out-of-distribution generalization and fairness objectives. Many studies aim to enhance robustness to spurious correlation, but they sometimes depend on group annotations for training. Additionally, a common limitation in previous research is the reliance on group-annotated validation datasets for model selection. This constrains their applicability in situations where the nature of the spurious correlation is not known, or when group labels for certain spurious attributes are not available. To enhance model robustness with minimal group annotation assumptions, we propose Environment-based Validation and Loss-based Sampling (EVaLS). It uses the losses from an ERM-trained model to construct a balanced dataset of high-loss and low-loss samples, mitigating group imbalance in data. This significantly enhances robustness to group shifts when equipped with a simple post-training last layer retraining. By using environment inference methods to create diverse environments with correlation shifts, EVaLS can potentially eliminate the need for group annotation in validation data. In this context, the worst environment accuracy acts as a reliable surrogate throughout the retraining process for tuning hyperparameters and finding a model that performs well across diverse group shifts. EVaLS effectively achieves group robustness, showing that group annotation is not necessary even for validation. It is a fast, straightforward, and effective approach that reaches near-optimal worst group accuracy without needing group annotations, marking a new chapter in the robustness of trained models against spurious correlation.
Abstract:Histopathological image analysis is crucial for accurate cancer diagnosis and treatment planning. While deep learning models, especially convolutional neural networks, have advanced this field, their "black-box" nature raises concerns about interpretability and trustworthiness. Explainable Artificial Intelligence (XAI) techniques aim to address these concerns, but evaluating their effectiveness remains challenging. A significant issue with current occlusion-based XAI methods is that they often generate Out-of-Distribution (OoD) samples, leading to inaccurate evaluations. In this paper, we introduce Inpainting-Based Occlusion (IBO), a novel occlusion strategy that utilizes a Denoising Diffusion Probabilistic Model to inpaint occluded regions in histopathological images. By replacing cancerous areas with realistic, non-cancerous tissue, IBO minimizes OoD artifacts and preserves data integrity. We evaluate our method on the CAMELYON16 dataset through two phases: first, by assessing perceptual similarity using the Learned Perceptual Image Patch Similarity (LPIPS) metric, and second, by quantifying the impact on model predictions through Area Under the Curve (AUC) analysis. Our results demonstrate that IBO significantly improves perceptual fidelity, achieving nearly twice the improvement in LPIPS scores compared to the best existing occlusion strategy. Additionally, IBO increased the precision of XAI performance prediction from 42% to 71% compared to traditional methods. These results demonstrate IBO's potential to provide more reliable evaluations of XAI techniques, benefiting histopathology and other applications. The source code for this study is available at https://github.com/a-fsh-r/IBO.
Abstract:Novelty detection is a critical task for deploying machine learning models in the open world. A crucial property of novelty detection methods is universality, which can be interpreted as generalization across various distributions of training or test data. More precisely, for novelty detection, distribution shifts may occur in the training set or the test set. Shifts in the training set refer to cases where we train a novelty detector on a new dataset and expect strong transferability. Conversely, distribution shifts in the test set indicate the methods' performance when the trained model encounters a shifted test sample. We experimentally show that existing methods falter in maintaining universality, which stems from their rigid inductive biases. Motivated by this, we aim for more generalized techniques that have more adaptable inductive biases. In this context, we leverage the fact that contrastive learning provides an efficient framework to easily switch and adapt to new inductive biases through the proper choice of augmentations in forming the negative pairs. We propose a novel probabilistic auto-negative pair generation method AutoAugOOD, along with contrastive learning, to yield a universal novelty detector method. Our experiments demonstrate the superiority of our method under different distribution shifts in various image benchmark datasets. Notably, our method emerges universality in the lens of adaptability to different setups of novelty detection, including one-class, unlabeled multi-class, and labeled multi-class settings. Code: https://github.com/mojtaba-nafez/UNODE