Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lili Zhao

MiMu: Mitigating Multiple Shortcut Learning Behavior of Transformers

Apr 14, 2025

Lili Zhao, Qi Liu, Wei Chen, Liyi Chen, Ruijun Sun, Min Hou, Yang Wang, Shijin Wang

Abstract:Empirical Risk Minimization (ERM) models often rely on spurious correlations between features and labels during the learning process, leading to shortcut learning behavior that undermines robustness generalization performance. Current research mainly targets identifying or mitigating a single shortcut; however, in real-world scenarios, cues within the data are diverse and unknown. In empirical studies, we reveal that the models rely to varying extents on different shortcuts. Compared to weak shortcuts, models depend more heavily on strong shortcuts, resulting in their poor generalization ability. To address these challenges, we propose MiMu, a novel method integrated with Transformer-based ERMs designed to Mitigate Multiple shortcut learning behavior, which incorporates self-calibration strategy and self-improvement strategy. In the source model, we preliminarily propose the self-calibration strategy to prevent the model from relying on shortcuts and make overconfident predictions. Then, we further design self-improvement strategy in target model to reduce the reliance on multiple shortcuts. The random mask strategy involves randomly masking partial attention positions to diversify the focus of target model other than concentrating on a fixed region. Meanwhile, the adaptive attention alignment module facilitates the alignment of attention weights to the calibrated source model, without the need for post-hoc attention maps or supervision. Finally, extensive experiments conducted on Natural Language Processing (NLP) and Computer Vision (CV) demonstrate the effectiveness of MiMu in improving robustness generalization abilities.

Via

Access Paper or Ask Questions

Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models

Oct 17, 2024

Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu

Figure 1 for Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models

Figure 2 for Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models

Figure 3 for Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models

Figure 4 for Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models

Abstract:Large Language Models (LLMs) have shown remarkable capabilities in various natural language processing tasks. However, LLMs may rely on dataset biases as shortcuts for prediction, which can significantly impair their robustness and generalization capabilities. This paper presents Shortcut Suite, a comprehensive test suite designed to evaluate the impact of shortcuts on LLMs' performance, incorporating six shortcut types, five evaluation metrics, and four prompting strategies. Our extensive experiments yield several key findings: 1) LLMs demonstrate varying reliance on shortcuts for downstream tasks, significantly impairing their performance. 2) Larger LLMs are more likely to utilize shortcuts under zero-shot and few-shot in-context learning prompts. 3) Chain-of-thought prompting notably reduces shortcut reliance and outperforms other prompting strategies, while few-shot prompts generally underperform compared to zero-shot prompts. 4) LLMs often exhibit overconfidence in their predictions, especially when dealing with datasets that contain shortcuts. 5) LLMs generally have a lower explanation quality in shortcut-laden datasets, with errors falling into three types: distraction, disguised comprehension, and logical fallacy. Our findings offer new insights for evaluating robustness and generalization in LLMs and suggest potential directions for mitigating the reliance on shortcuts. The code is available at \url {https://github.com/yyhappier/ShortcutSuite.git}.

Via

Access Paper or Ask Questions

Event Grounded Criminal Court View Generation with Cooperative (Large) Language Models

Apr 13, 2024

Linan Yue, Qi Liu, Lili Zhao, Li Wang, Weibo Gao, Yanqing An

Figure 1 for Event Grounded Criminal Court View Generation with Cooperative (Large) Language Models

Figure 2 for Event Grounded Criminal Court View Generation with Cooperative (Large) Language Models

Figure 3 for Event Grounded Criminal Court View Generation with Cooperative (Large) Language Models

Figure 4 for Event Grounded Criminal Court View Generation with Cooperative (Large) Language Models

Abstract:With the development of legal intelligence, Criminal Court View Generation has attracted much attention as a crucial task of legal intelligence, which aims to generate concise and coherent texts that summarize case facts and provide explanations for verdicts. Existing researches explore the key information in case facts to yield the court views. Most of them employ a coarse-grained approach that partitions the facts into broad segments (e.g., verdict-related sentences) to make predictions. However, this approach fails to capture the complex details present in the case facts, such as various criminal elements and legal events. To this end, in this paper, we propose an Event Grounded Generation (EGG) method for criminal court view generation with cooperative (Large) Language Models, which introduces the fine-grained event information into the generation. Specifically, we first design a LLMs-based extraction method that can extract events in case facts without massive annotated events. Then, we incorporate the extracted events into court view generation by merging case facts and events. Besides, considering the computational burden posed by the use of LLMs in the extraction phase of EGG, we propose a LLMs-free EGG method that can eliminate the requirement for event extraction using LLMs in the inference phase. Extensive experimental results on a real-world dataset clearly validate the effectiveness of our proposed method.

* Accepted to SIGIR2024

Via

Access Paper or Ask Questions

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Jun 13, 2023

Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao(+37 more)

Figure 1 for Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Figure 2 for Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Figure 3 for Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Figure 4 for Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Abstract:Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM.

Via

Access Paper or Ask Questions

Lossless Point Cloud Attribute Compression with Normal-based Intra Prediction

Jun 23, 2021

Qian Yin, Qingshan Ren, Lili Zhao, Wenyi Wang, Jianwen Chen

Figure 1 for Lossless Point Cloud Attribute Compression with Normal-based Intra Prediction

Figure 2 for Lossless Point Cloud Attribute Compression with Normal-based Intra Prediction

Figure 3 for Lossless Point Cloud Attribute Compression with Normal-based Intra Prediction

Figure 4 for Lossless Point Cloud Attribute Compression with Normal-based Intra Prediction

Abstract:The sparse LiDAR point clouds become more and more popular in various applications, e.g., the autonomous driving. However, for this type of data, there exists much under-explored space in the corresponding compression framework proposed by MPEG, i.e., geometry-based point cloud compression (G-PCC). In G-PCC, only the distance-based similarity is considered in the intra prediction for the attribute compression. In this paper, we propose a normal-based intra prediction scheme, which provides a more efficient lossless attribute compression by introducing the normals of point clouds. The angle between normals is used to further explore accurate local similarity, which optimizes the selection of predictors. We implement our method into the G-PCC reference software. Experimental results over LiDAR acquired datasets demonstrate that our proposed method is able to deliver better compression performance than the G-PCC anchor, with $2.1\%$ gains on average for lossless attribute coding.

* Accepted by the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting 2021

Via

Access Paper or Ask Questions

RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network

Jun 01, 2021

Lili Zhao, Zezhi Zhu, Xuhu Lin, Xuezhou Guo, Qian Yin, Wenyi Wang, Jianwen Chen

Figure 1 for RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network

Figure 2 for RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network

Figure 3 for RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network

Figure 4 for RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network

Abstract:LiDAR point cloud frame interpolation, which synthesizes the intermediate frame between the captured frames, has emerged as an important issue for many applications. Especially for reducing the amounts of point cloud transmission, it is by predicting the intermediate frame based on the reference frames to upsample data to high frame rate ones. However, due to high-dimensional and sparse characteristics of point clouds, it is more difficult to predict the intermediate frame for LiDAR point clouds than videos. In this paper, we propose a novel LiDAR point cloud frame interpolation method, which exploits range images (RIs) as an intermediate representation with CNNs to conduct the frame interpolation process. Considering the inherited characteristics of RIs differ from that of color images, we introduce spatially adaptive convolutions to extract range features adaptively, while a high-efficient flow estimation method is presented to generate optical flows. The proposed model then warps the input frames and range features, based on the optical flows to synthesize the interpolated frame. Extensive experiments on the KITTI dataset have clearly demonstrated that our method consistently achieves superior frame interpolation results with better perceptual quality to that of using state-of-the-art video frame interpolation methods. The proposed method could be integrated into any LiDAR point cloud compression systems for inter prediction.

* Accepted by the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting 2021

Via

Access Paper or Ask Questions

An Unsupervised Optical Flow Estimation For LiDAR Image Sequences

May 28, 2021

Xuezhou Guo, Xuhu Lin, Lili Zhao, Zezhi Zhu, Jianwen Chen

Figure 1 for An Unsupervised Optical Flow Estimation For LiDAR Image Sequences

Figure 2 for An Unsupervised Optical Flow Estimation For LiDAR Image Sequences

Figure 3 for An Unsupervised Optical Flow Estimation For LiDAR Image Sequences

Figure 4 for An Unsupervised Optical Flow Estimation For LiDAR Image Sequences

Abstract:In recent years, the LiDAR images, as a 2D compact representation of 3D LiDAR point clouds, are widely applied in various tasks, e.g., 3D semantic segmentation, LiDAR point cloud compression (PCC). Among these works, the optical flow estimation for LiDAR image sequences has become a key issue, especially for the motion estimation of the inter prediction in PCC. However, the existing optical flow estimation models are likely to be unreliable for LiDAR images. In this work, we first propose a light-weight flow estimation model for LiDAR image sequences. The key novelty of our method lies in two aspects. One is that for the different characteristics (with the spatial-variation feature distribution) of the LiDAR images w.r.t. the normal color images, we introduce the attention mechanism into our model to improve the quality of the estimated flow. The other one is that to tackle the lack of large-scale LiDAR-image annotations, we present an unsupervised method, which directly minimizes the inconsistency between the reference image and the reconstructed image based on the estimated optical flow. Extensive experimental results have shown that our proposed model outperforms other mainstream models on the KITTI dataset, with much fewer parameters.

* Accepted by ICIP2021

Via

Access Paper or Ask Questions

BDNNSurv: Bayesian deep neural networks for survival analysis using pseudo values

Jan 07, 2021

Dai Feng, Lili Zhao

Figure 1 for BDNNSurv: Bayesian deep neural networks for survival analysis using pseudo values

Figure 2 for BDNNSurv: Bayesian deep neural networks for survival analysis using pseudo values

Figure 3 for BDNNSurv: Bayesian deep neural networks for survival analysis using pseudo values

Figure 4 for BDNNSurv: Bayesian deep neural networks for survival analysis using pseudo values

Abstract:There has been increasing interest in modeling survival data using deep learning methods in medical research. In this paper, we proposed a Bayesian hierarchical deep neural networks model for modeling and prediction of survival data. Compared with previously studied methods, the new proposal can provide not only point estimate of survival probability but also quantification of the corresponding uncertainty, which can be of crucial importance in predictive modeling and subsequent decision making. The favorable statistical properties of point and uncertainty estimates were demonstrated by simulation studies and real data analysis. The Python code implementing the proposed approach was provided.

Via

Access Paper or Ask Questions

Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

Mar 05, 2020

Yongle Luo, Kun Dong, Lili Zhao, Zhiyong Sun, Chao Zhou, Bo Song

Figure 1 for Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

Figure 2 for Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

Figure 3 for Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

Figure 4 for Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

Abstract:Efficient and effective learning is one of the ultimate goals of the deep reinforcement learning (DRL), although the compromise has been made in most of the time, especially for the application of robot manipulations. Learning is always expensive for robot manipulation tasks and the learning effectiveness could be affected by the system uncertainty. In order to solve above challenges, in this study, we proposed a simple but powerful reward shaping method, namely Dense2Sparse. It combines the advantage of fast convergence of dense reward and the noise isolation of the sparse reward, to achieve a balance between learning efficiency and effectiveness, which makes it suitable for robot manipulation tasks. We evaluated our Dense2Sparse method with a series of ablation experiments using the state representation model with system uncertainty. The experiment results show that the Dense2Sparse method obtained higher expected reward compared with the ones using standalone dense reward or sparse reward, and it also has a superior tolerance of system uncertainty.

Via

Access Paper or Ask Questions

DNNSurv: Deep Neural Networks for Survival Analysis Using Pseudo Values

Aug 06, 2019

Lili Zhao, Dai Feng

Figure 1 for DNNSurv: Deep Neural Networks for Survival Analysis Using Pseudo Values

Figure 2 for DNNSurv: Deep Neural Networks for Survival Analysis Using Pseudo Values

Figure 3 for DNNSurv: Deep Neural Networks for Survival Analysis Using Pseudo Values

Figure 4 for DNNSurv: Deep Neural Networks for Survival Analysis Using Pseudo Values

Abstract:There has been increasing interest in modelling survival data using deep learning methods in medical research. Current approaches have focused on designing special cost functions to handle censored survival data. We propose a very different method with two steps. In the first step, we transform each subject's survival time into a series of jackknife pseudo conditional survival probabilities and then use these pseudo probabilities as a quantitative response variable in the deep neural network model. By using the pseudo values, we reduce a complex survival analysis to a standard regression problem, which greatly simplifies the neural network construction. Our two-step approach is simple, yet very flexible in making risk predictions for survival data, which is very appealing from the practice point of view. The source code is freely available at http://github.com/lilizhaoUM/DNNSurv.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions