Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoying Tang

Fairness-aware Anomaly Detection via Fair Projection

May 16, 2025

Feng Xiao, Xiaoying Tang, Jicong Fan

Abstract:Unsupervised anomaly detection is a critical task in many high-social-impact applications such as finance, healthcare, social media, and cybersecurity, where demographics involving age, gender, race, disease, etc, are used frequently. In these scenarios, possible bias from anomaly detection systems can lead to unfair treatment for different groups and even exacerbate social bias. In this work, first, we thoroughly analyze the feasibility and necessary assumptions for ensuring group fairness in unsupervised anomaly detection. Second, we propose a novel fairness-aware anomaly detection method FairAD. From the normal training data, FairAD learns a projection to map data of different demographic groups to a common target distribution that is simple and compact, and hence provides a reliable base to estimate the density of the data. The density can be directly used to identify anomalies while the common target distribution ensures fairness between different groups. Furthermore, we propose a threshold-free fairness metric that provides a global view for model's fairness, eliminating dependence on manual threshold selection. Experiments on real-world benchmarks demonstrate that our method achieves an improved trade-off between detection accuracy and fairness under both balanced and skewed data across different groups.

Via

Access Paper or Ask Questions

Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment

Dec 29, 2024

Shiyun Chen, Li Lin, Pujin Cheng, ZhiCheng Jin, JianJian Chen, HaiDong Zhu, Kenneth K. Y. Wong, Xiaoying Tang

Figure 1 for Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment

Figure 2 for Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment

Figure 3 for Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment

Figure 4 for Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment

Abstract:Multimodal learning has been demonstrated to enhance performance across various clinical tasks, owing to the diverse perspectives offered by different modalities of data. However, existing multimodal segmentation methods rely on well-registered multimodal data, which is unrealistic for real-world clinical images, particularly for indistinct and diffuse regions such as liver tumors. In this paper, we introduce Diff4MMLiTS, a four-stage multimodal liver tumor segmentation pipeline: pre-registration of the target organs in multimodal CTs; dilation of the annotated modality's mask and followed by its use in inpainting to obtain multimodal normal CTs without tumors; synthesis of strictly aligned multimodal CTs with tumors using the latent diffusion model based on multimodal CT features and randomly generated tumor masks; and finally, training the segmentation model, thus eliminating the need for strictly aligned multimodal data. Extensive experiments on public and internal datasets demonstrate the superiority of Diff4MMLiTS over other state-of-the-art multimodal segmentation methods.

Via

Access Paper or Ask Questions

Federated Unlearning with Gradient Descent and Conflict Mitigation

Dec 28, 2024

Zibin Pan, Zhichao Wang, Chi Li, Kaiyan Zheng, Boqi Wang, Xiaoying Tang, Junhua Zhao

Figure 1 for Federated Unlearning with Gradient Descent and Conflict Mitigation

Figure 2 for Federated Unlearning with Gradient Descent and Conflict Mitigation

Figure 3 for Federated Unlearning with Gradient Descent and Conflict Mitigation

Figure 4 for Federated Unlearning with Gradient Descent and Conflict Mitigation

Abstract:Federated Learning (FL) has received much attention in recent years. However, although clients are not required to share their data in FL, the global model itself can implicitly remember clients' local data. Therefore, it's necessary to effectively remove the target client's data from the FL global model to ease the risk of privacy leakage and implement ``the right to be forgotten". Federated Unlearning (FU) has been considered a promising way to remove data without full retraining. But the model utility easily suffers significant reduction during unlearning due to the gradient conflicts. Furthermore, when conducting the post-training to recover the model utility, the model is prone to move back and revert what has already been unlearned. To address these issues, we propose Federated Unlearning with Orthogonal Steepest Descent (FedOSD). We first design an unlearning Cross-Entropy loss to overcome the convergence issue of the gradient ascent. A steepest descent direction for unlearning is then calculated in the condition of being non-conflicting with other clients' gradients and closest to the target client's gradient. This benefits to efficiently unlearn and mitigate the model utility reduction. After unlearning, we recover the model utility by maintaining the achievement of unlearning. Finally, extensive experiments in several FL scenarios verify that FedOSD outperforms the SOTA FU algorithms in terms of unlearning and model utility.

* To be published in the Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25)

Via

Access Paper or Ask Questions

NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Dec 02, 2024

Bikang Pan, Qun Li, Xiaoying Tang, Wei Huang, Zhen Fang, Feng Liu, Jingya Wang, Jingyi Yu, Ye Shi

Figure 1 for NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Figure 2 for NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Figure 3 for NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Figure 4 for NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Abstract:The emergence of vision-language foundation models, such as CLIP, has revolutionized image-text representation, enabling a broad range of applications via prompt learning. Despite its promise, real-world datasets often contain noisy labels that can degrade prompt learning performance. In this paper, we demonstrate that using mean absolute error (MAE) loss in prompt learning, named PromptMAE, significantly enhances robustness against noisy labels while maintaining high accuracy. Though MAE is straightforward and recognized for its robustness, it is rarely used in noisy-label learning due to its slow convergence and poor performance outside prompt learning scenarios. To elucidate the robustness of PromptMAE, we leverage feature learning theory to show that MAE can suppress the influence of noisy samples, thereby improving the signal-to-noise ratio and enhancing overall robustness. Additionally, we introduce PromptOT, a prompt-based optimal transport data purification method to enhance the robustness further. PromptOT employs text encoder representations in vision-language models as prototypes to construct an optimal transportation matrix. This matrix effectively partitions datasets into clean and noisy subsets, allowing for the application of cross-entropy loss to the clean subset and MAE loss to the noisy subset. Our Noise-Label Prompt Learning method, named NLPrompt, offers a simple and efficient approach that leverages the expressive representation and precise alignment capabilities of vision-language models for robust prompt learning. We validate NLPrompt through extensive experiments across various noise settings, demonstrating significant performance improvements.

Via

Access Paper or Ask Questions

AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer

Dec 01, 2024

Jin Lyu, Tianyi Zhu, Yi Gu, Li Lin, Pujin Cheng, Yebin Liu, Xiaoying Tang, Liang An

Figure 1 for AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer

Figure 2 for AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer

Figure 3 for AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer

Figure 4 for AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer

Abstract:Quantitative analysis of animal behavior and biomechanics requires accurate animal pose and shape estimation across species, and is important for animal welfare and biological research. However, the small network capacity of previous methods and limited multi-species dataset leave this problem underexplored. To this end, this paper presents AniMer to estimate animal pose and shape using family aware Transformer, enhancing the reconstruction accuracy of diverse quadrupedal families. A key insight of AniMer is its integration of a high-capacity Transformer-based backbone and an animal family supervised contrastive learning scheme, unifying the discriminative understanding of various quadrupedal shapes within a single framework. For effective training, we aggregate most available open-sourced quadrupedal datasets, either with 3D or 2D labels. To improve the diversity of 3D labeled data, we introduce CtrlAni3D, a novel large-scale synthetic dataset created through a new diffusion-based conditional image generation pipeline. CtrlAni3D consists of about 10k images with pixel-aligned SMAL labels. In total, we obtain 41.3k annotated images for training and validation. Consequently, the combination of a family aware Transformer network and an expansive dataset enables AniMer to outperform existing methods not only on 3D datasets like Animal3D and CtrlAni3D, but also on out-of-distribution Animal Kingdom dataset. Ablation studies further demonstrate the effectiveness of our network design and CtrlAni3D in enhancing the performance of AniMer for in-the-wild applications. The project page of AniMer is https://luoxue-star.github.io/AniMer_project_page/.

Via

Access Paper or Ask Questions

FedMABA: Towards Fair Federated Learning through Multi-Armed Bandits Allocation

Oct 26, 2024

Zhichao Wang, Lin Wang, Yongxin Guo, Ying-Jun Angela Zhang, Xiaoying Tang

Abstract:The increasing concern for data privacy has driven the rapid development of federated learning (FL), a privacy-preserving collaborative paradigm. However, the statistical heterogeneity among clients in FL results in inconsistent performance of the server model across various clients. Server model may show favoritism towards certain clients while performing poorly for others, heightening the challenge of fairness. In this paper, we reconsider the inconsistency in client performance distribution and introduce the concept of adversarial multi-armed bandit to optimize the proposed objective with explicit constraints on performance disparities. Practically, we propose a novel multi-armed bandit-based allocation FL algorithm (FedMABA) to mitigate performance unfairness among diverse clients with different data distributions. Extensive experiments, in different Non-I.I.D. scenarios, demonstrate the exceptional performance of FedMABA in enhancing fairness.

Via

Access Paper or Ask Questions

Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

Oct 15, 2024

Sihang Zhao, Youliang Yuan, Xiaoying Tang, Pinjia He

Figure 1 for Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

Figure 2 for Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

Figure 3 for Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

Figure 4 for Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

Abstract:Multimodal Large Language Models (MLLMs) demonstrate a strong understanding of the real world and can even handle complex tasks. However, they still fail on some straightforward visual question-answering (VQA) problems. This paper dives deeper into this issue, revealing that models tend to err when answering easy questions (e.g. Yes/No questions) about an image, even though they can correctly describe it. We refer to this model behavior discrepancy between difficult and simple questions as model laziness. To systematically investigate model laziness, we manually construct LazyBench, a benchmark that includes Yes/No, multiple choice, short answer questions, and image description tasks that are related to the same subjects in the images. Based on LazyBench, we observe that laziness widely exists in current advanced MLLMs (e.g. GPT-4o, Gemini-1.5-pro, Claude 3 and LLaVA-v1.5-13B), and it is more pronounced on stronger models. We also analyze the VQA v2 (LLaVA-v1.5-13B) benchmark and find that about half of its failure cases are caused by model laziness, which further highlights the importance of ensuring that the model fully utilizes its capability. To this end, we conduct preliminary exploration on how to mitigate laziness and find that chain of thought (CoT) can effectively address this issue.

* EMNLP 2024 Findings

Via

Access Paper or Ask Questions

FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning

Oct 15, 2024

Xinpeng Wang, Xiaoying Tang

Figure 1 for FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning

Figure 2 for FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning

Figure 3 for FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning

Figure 4 for FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning

Abstract:Domain Generalization (DG) aims to train models that can effectively generalize to unseen domains. However, in the context of Federated Learning (FL), where clients collaboratively train a model without directly sharing their data, most existing DG algorithms are not directly applicable to the FL setting due to privacy constraints, as well as the limited data quantity and domain diversity at each client. To tackle these challenges, we propose FedCCRL, a novel federated domain generalization method that significantly improves the model's ability to generalize to unseen domains without compromising privacy or incurring excessive computational and communication costs. Specifically, we adapt MixStyle to the federated setting to transfer domain-specific features while AugMix is employed to perturb domain-invariant features. Furthermore, we leverage supervised contrastive loss for representation alignment and utilize Jensen-Shannon divergence to ensure consistent predictions between original and augmented samples. Extensive experimental results demonstrate that FedCCRL achieves the state-of-the-art performances on the PACS, OfficeHome and miniDomainNet datasets across varying numbers of clients. Code is available at https://github.com/SanphouWang/FedCCRL.

Via

Access Paper or Ask Questions

TRACE: Temporal Grounding Video LLM via Causal Event Modeling

Oct 08, 2024

Yongxin Guo, Jingyu Liu, Mingda Li, Xiaoying Tang, Qingbin Liu, Xi Chen

Figure 1 for TRACE: Temporal Grounding Video LLM via Causal Event Modeling

Figure 2 for TRACE: Temporal Grounding Video LLM via Causal Event Modeling

Figure 3 for TRACE: Temporal Grounding Video LLM via Causal Event Modeling

Figure 4 for TRACE: Temporal Grounding Video LLM via Causal Event Modeling

Abstract:Video Temporal Grounding (VTG) is a crucial capability for video understanding models and plays a vital role in downstream tasks such as video browsing and editing. To effectively handle various tasks simultaneously and enable zero-shot prediction, there is a growing trend in employing video LLMs for VTG tasks. However, current video LLM-based methods rely exclusively on natural language generation, lacking the ability to model the clear structure inherent in videos, which restricts their effectiveness in tackling VTG tasks. To address this issue, this paper first formally introduces causal event modeling framework, which represents videos as sequences of events, and predict the current event using previous events, video inputs, and textural instructions. Each event consists of three components: timestamps, salient scores, and textual captions. We then propose a novel task-interleaved video LLM called TRACE to effectively implement the causal event modeling framework in practice. The TRACE processes visual frames, timestamps, salient scores, and text as distinct tasks, employing various encoders and decoding heads for each. Task tokens are arranged in an interleaved sequence according to the causal event modeling framework's formulation. Extensive experiments on various VTG tasks and datasets demonstrate the superior performance of TRACE compared to state-of-the-art video LLMs. Our model and code are available at \url{https://github.com/gyxxyg/TRACE}.

Via

Access Paper or Ask Questions

FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification

Aug 05, 2024

Yijin Huang, Pujin Cheng, Roger Tam, Xiaoying Tang

Figure 1 for FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification

Figure 2 for FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification

Figure 3 for FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification

Figure 4 for FPT+: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification

Abstract:The success of large-scale pre-trained models has established fine-tuning as a standard method for achieving significant improvements in downstream tasks. However, fine-tuning the entire parameter set of a pre-trained model is costly. Parameter-efficient transfer learning (PETL) has recently emerged as a cost-effective alternative for adapting pre-trained models to downstream tasks. Despite its advantages, the increasing model size and input resolution present challenges for PETL, as the training memory consumption is not reduced as effectively as the parameter usage. In this paper, we introduce Fine-grained Prompt Tuning plus (FPT+), a PETL method designed for high-resolution medical image classification, which significantly reduces memory consumption compared to other PETL methods. FPT+ performs transfer learning by training a lightweight side network and accessing pre-trained knowledge from a large pre-trained model (LPM) through fine-grained prompts and fusion modules. Specifically, we freeze the LPM and construct a learnable lightweight side network. The frozen LPM processes high-resolution images to extract fine-grained features, while the side network employs the corresponding down-sampled low-resolution images to minimize the memory usage. To enable the side network to leverage pre-trained knowledge, we propose fine-grained prompts and fusion modules, which collaborate to summarize information through the LPM's intermediate activations. We evaluate FPT+ on eight medical image datasets of varying sizes, modalities, and complexities. Experimental results demonstrate that FPT+ outperforms other PETL methods, using only 1.03% of the learnable parameters and 3.18% of the memory required for fine-tuning an entire ViT-B model. Our code is available at https://github.com/YijinHuang/FPT.

Via

Access Paper or Ask Questions