Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fan Huang

DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs

May 15, 2025

Lake Yin, Fan Huang

Abstract:As Large Language Models (LLMs) have risen in prominence over the past few years, there has been concern over the potential biases in LLMs inherited from the training data. Previous studies have examined how LLMs exhibit implicit bias, such as when response generation changes when different social contexts are introduced. We argue that this implicit bias is not only an ethical, but also a technical issue, as it reveals an inability of LLMs to accommodate extraneous information. However, unlike other measures of LLM intelligence, there are no standard methods to benchmark this specific subset of LLM bias. To bridge this gap, we developed a method for calculating an easily interpretable benchmark, DIF (Demographic Implicit Fairness), by evaluating preexisting LLM logic and math problem datasets with sociodemographic personas. We demonstrate that this method can statistically validate the presence of implicit bias in LLM behavior and find an inverse trend between question answering accuracy and implicit bias, supporting our argument.

* 7 pages, 1 figure

Via

Access Paper or Ask Questions

CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

Apr 12, 2025

Siyuan Kan, Huanyu Wu, Zhenyao Cui, Fan Huang, Xiaolong Xu, Dongrui Wu

Figure 1 for CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

Figure 2 for CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

Figure 3 for CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

Figure 4 for CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

Abstract:Emotion recognition is an important component of affective computing, and also human-machine interaction. Unimodal emotion recognition is convenient, but the accuracy may not be high enough; on the contrary, multi-modal emotion recognition may be more accurate, but it also increases the complexity and cost of the data collection system. This paper considers cross-modal emotion recognition, i.e., using both electroencephalography (EEG) and eye movement in training, but only EEG or eye movement in test. We propose cross-modal contrastive representation distillation (CMCRD), which uses a pre-trained eye movement classification model to assist the training of an EEG classification model, improving feature extraction from EEG signals, or vice versa. During test, only EEG signals (or eye movement signals) are acquired, eliminating the need for multi-modal data. CMCRD not only improves the emotion recognition accuracy, but also makes the system more simplified and practical. Experiments using three different neural network architectures on three multi-modal emotion recognition datasets demonstrated the effectiveness of CMCRD. Compared with the EEG-only model, it improved the average classification accuracy by about 6.2%.

Via

Access Paper or Ask Questions

Diffusion-augmented Graph Contrastive Learning for Collaborative Filter

Mar 20, 2025

Fan Huang, Wei Wang

Figure 1 for Diffusion-augmented Graph Contrastive Learning for Collaborative Filter

Figure 2 for Diffusion-augmented Graph Contrastive Learning for Collaborative Filter

Figure 3 for Diffusion-augmented Graph Contrastive Learning for Collaborative Filter

Figure 4 for Diffusion-augmented Graph Contrastive Learning for Collaborative Filter

Abstract:Graph-based collaborative filtering has been established as a prominent approach in recommendation systems, leveraging the inherent graph topology of user-item interactions to model high-order connectivity patterns and enhance recommendation performance. Recent advances in Graph Contrastive Learning (GCL) have demonstrated promising potential to alleviate data sparsity issues by improving representation learning through contrastive view generation and mutual information maximization. However, existing approaches lack effective data augmentation strategies. Structural augmentation risks distorting fundamental graph topology, while feature-level perturbation techniques predominantly employ uniform noise scales that fail to account for node-specific characteristics. To solve these challenges, we propose Diffusion-augmented Contrastive Learning (DGCL), an innovative framework that integrates diffusion models with contrastive learning for enhanced collaborative filtering. Our approach employs a diffusion process that learns node-specific Gaussian distributions of representations, thereby generating semantically consistent yet diversified contrastive views through reverse diffusion sampling. DGCL facilitates adaptive data augmentation based on reconstructed representations, considering both semantic coherence and node-specific features. In addition, it explores unrepresented regions of the latent sparse feature space, thereby enriching the diversity of contrastive views. Extensive experimental results demonstrate the effectiveness of DGCL on three public datasets.

Via

Access Paper or Ask Questions

Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Jul 11, 2024

Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma(+15 more)

Figure 1 for Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Figure 2 for Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Figure 3 for Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Figure 4 for Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Abstract:A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.

* 23 pages

Via

Access Paper or Ask Questions

MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Apr 02, 2024

Qiang Hu, Zhenyu Yi, Ying Zhou, Ting Li, Fan Huang, Mei Liu, Qiang Li, Zhiwei Wang

Figure 1 for MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Figure 2 for MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Figure 3 for MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Figure 4 for MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Abstract:We propose MonoBox, an innovative box-supervised segmentation method constrained by monotonicity to liberate its training from the user-unfriendly box-tightness assumption. In contrast to conventional box-supervised segmentation, where the box edges must precisely touch the target boundaries, MonoBox leverages imprecisely-annotated boxes to achieve robust pixel-wise segmentation. The 'linchpin' is that, within the noisy zones around box edges, MonoBox discards the traditional misguiding multiple-instance learning loss, and instead optimizes a carefully-designed objective, termed monotonicity constraint. Along directions transitioning from the foreground to background, this new constraint steers responses to adhere to a trend of monotonically decreasing values. Consequently, the originally unreliable learning within the noisy zones is transformed into a correct and effective monotonicity optimization. Moreover, an adaptive label correction is introduced, enabling MonoBox to enhance the tightness of box annotations using predicted masks from the previous epoch and dynamically shrink the noisy zones as training progresses. We verify MonoBox in the box-supervised segmentation task of polyps, where satisfying box-tightness is challenging due to the vague boundaries between the polyp and normal tissues. Experiments on both public synthetic and in-house real noisy datasets demonstrate that MonoBox exceeds other anti-noise state-of-the-arts by improving Dice by at least 5.5% and 3.3%, respectively. Codes are at https://github.com/Huster-Hq/MonoBox.

Via

Access Paper or Ask Questions

ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?

Mar 26, 2024

Fan Huang, Haewoon Kwak, Kunwoo Park, Jisun An

Figure 1 for ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?

Figure 2 for ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?

Figure 3 for ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?

Figure 4 for ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?

Abstract:As AI becomes more integral in our lives, the need for transparency and responsibility grows. While natural language explanations (NLEs) are vital for clarifying the reasoning behind AI decisions, evaluating them through human judgments is complex and resource-intensive due to subjectivity and the need for fine-grained ratings. This study explores the alignment between ChatGPT and human assessments across multiple scales (i.e., binary, ternary, and 7-Likert scale). We sample 300 data instances from three NLE datasets and collect 900 human annotations for both informativeness and clarity scores as the text quality measurement. We further conduct paired comparison experiments under different ranges of subjectivity scores, where the baseline comes from 8,346 human annotations. Our results show that ChatGPT aligns better with humans in more coarse-grained scales. Also, paired comparisons and dynamic prompting (i.e., providing semantically similar examples in the prompt) improve the alignment. This research advances our understanding of large language models' capabilities to assess the text explanation quality in different configurations for responsible AI development.

* Accpeted by LREC-COLING 2024 main conference, long paper

Via

Access Paper or Ask Questions

Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text Detection

Feb 17, 2024

Fan Huang, Haewoon Kwak, Jisun An

Figure 1 for Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text Detection

Figure 2 for Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text Detection

Figure 3 for Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text Detection

Figure 4 for Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text Detection

Abstract:The robustness of AI-content detection models against cultivated attacks (e.g., paraphrasing or word switching) remains a significant concern. This study proposes a novel token-ensemble generation strategy to challenge the robustness of current AI-content detection approaches. We explore the ensemble attack strategy by completing the prompt with the next token generated from random candidate LLMs. We find the token-ensemble approach significantly drops the performance of AI-content detection models (The code and test sets will be released). Our findings reveal that token-ensemble generation poses a vital challenge to current detection models and underlines the need for advancing detection technologies to counter sophisticated adversarial strategies.

* Submitted to ACL 2024

Via

Access Paper or Ask Questions

Cold & Warm Net: Addressing Cold-Start Users in Recommender Systems

Sep 27, 2023

Xiangyu Zhang, Zongqiang Kuang, Zehao Zhang, Fan Huang, Xianfeng Tan

Abstract:Cold-start recommendation is one of the major challenges faced by recommender systems (RS). Herein, we focus on the user cold-start problem. Recently, methods utilizing side information or meta-learning have been used to model cold-start users. However, it is difficult to deploy these methods to industrial RS. There has not been much research that pays attention to the user cold-start problem in the matching stage. In this paper, we propose Cold & Warm Net based on expert models who are responsible for modeling cold-start and warm-up users respectively. A gate network is applied to incorporate the results from two experts. Furthermore, dynamic knowledge distillation acting as a teacher selector is introduced to assist experts in better learning user representation. With comprehensive mutual information, features highly relevant to user behavior are selected for the bias net which explicitly models user behavior bias. Finally, we evaluate our Cold & Warm Net on public datasets in comparison to models commonly applied in the matching stage and it outperforms other models on all user types. The proposed model has also been deployed on an industrial short video platform and achieves a significant increase in app dwell time and user retention rate.

Via

Access Paper or Ask Questions

Random Padding Data Augmentation

Feb 17, 2023

Nan Yang, Laicheng Zhong, Fan Huang, Dong Yuan, Wei Bao

Abstract:The convolutional neural network (CNN) learns the same object in different positions in images, which can improve the recognition accuracy of the model. An implication of this is that CNN may know where the object is. The usefulness of the features' spatial information in CNNs has not been well investigated. In this paper, we found that the model's learning of features' position information hindered the learning of the features' relationship. Therefore, we introduced Random Padding, a new type of padding method for training CNNs that impairs the architecture's capacity to learn position information by adding zero-padding randomly to half of the border of feature maps. Random Padding is parameter-free, simple to construct, and compatible with the majority of CNN-based recognition models. This technique is also complementary to data augmentations such as random cropping, rotation, flipping and erasing, and consistently improves the performance of image classification over strong baselines.

Via

Access Paper or Ask Questions

Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

Feb 11, 2023

Fan Huang, Haewoon Kwak, Jisun An

Figure 1 for Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

Figure 2 for Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

Figure 3 for Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

Abstract:Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-generated NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.

Via

Access Paper or Ask Questions