Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lingwei Chen

Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI

Jan 03, 2025

Christopher Burger, Charles Walter, Thai Le, Lingwei Chen

Figure 1 for Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI

Figure 2 for Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI

Figure 3 for Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI

Figure 4 for Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI

Abstract:Recent work has investigated the concept of adversarial attacks on explainable AI (XAI) in the NLP domain with a focus on examining the vulnerability of local surrogate methods such as Lime to adversarial perturbations or small changes on the input of a machine learning (ML) model. In such attacks, the generated explanation is manipulated while the meaning and structure of the original input remain similar under the ML model. Such attacks are especially alarming when XAI is used as a basis for decision making (e.g., prescribing drugs based on AI medical predictors) or for legal action (e.g., legal dispute involving AI software). Although weaknesses across many XAI methods have been shown to exist, the reasons behind why remain little explored. Central to this XAI manipulation is the similarity measure used to calculate how one explanation differs from another. A poor choice of similarity measure can lead to erroneous conclusions about the stability or adversarial robustness of an XAI method. Therefore, this work investigates a variety of similarity measures designed for text-based ranked lists referenced in related work to determine their comparative suitability for use. We find that many measures are overly sensitive, resulting in erroneous estimates of stability. We then propose a weighting scheme for text-based data that incorporates the synonymity between the features within an explanation, providing more accurate estimates of the actual weakness of XAI methods to adversarial examples.

* 12 pages, 1 figure, 4 tables. arXiv admin note: substantial text overlap with arXiv:2406.15839. substantial text overlap with arXiv:2501.01516

Via

Access Paper or Ask Questions

Enhancing Data-Limited Graph Neural Networks by Actively Distilling Knowledge from Large Language Models

Jul 19, 2024

Quan Li, Tianxiang Zhao, Lingwei Chen, Junjie Xu, Suhang Wang

Abstract:Graphs have emerged as critical data structures for content analysis in various domains, such as social network analysis, bioinformatics, and recommendation systems. Node classification, a fundamental task in this context, is typically tackled using graph neural networks (GNNs). Unfortunately, conventional GNNs still face challenges in scenarios with few labeled nodes, despite the prevalence of few-shot node classification tasks in real-world applications. To address this challenge, various approaches have been proposed, including graph meta-learning, transfer learning, and methods based on Large Language Models (LLMs). However, traditional meta-learning and transfer learning methods often require prior knowledge from base classes or fail to exploit the potential advantages of unlabeled nodes. Meanwhile, LLM-based methods may overlook the zero-shot capabilities of LLMs and rely heavily on the quality of generated contexts. In this paper, we propose a novel approach that integrates LLMs and GNNs, leveraging the zero-shot inference and reasoning capabilities of LLMs and employing a Graph-LLM-based active learning paradigm to enhance GNNs' performance. Extensive experiments demonstrate the effectiveness of our model in improving node classification accuracy with considerably limited labeled data, surpassing state-of-the-art baselines by significant margins.

* 10 pages, 3 Figures

Via

Access Paper or Ask Questions

Hierarchical Knowledge Distillation on Text Graph for Data-limited Attribute Inference

Jan 10, 2024

Quan Li, Shixiong Jing, Lingwei Chen

Abstract:The popularization of social media increases user engagements and generates a large amount of user-oriented data. Among them, text data (e.g., tweets, blogs) significantly attracts researchers and speculators to infer user attributes (e.g., age, gender, location) for fulfilling their intents. Generally, this line of work casts attribute inference as a text classification problem, and starts to leverage graph neural networks (GNNs) to utilize higher-level representations of source texts. However, these text graphs are constructed over words, suffering from high memory consumption and ineffectiveness on few labeled texts. To address this challenge, we design a text-graph-based few-shot learning model for attribute inferences on social media text data. Our model first constructs and refines a text graph using manifold learning and message passing, which offers a better trade-off between expressiveness and complexity. Afterwards, to further use cross-domain texts and unlabeled texts to improve few-shot performance, a hierarchical knowledge distillation is devised over text graph to optimize the problem, which derives better text representations, and advances model generalization ability. Experiments on social media datasets demonstrate the state-of-the-art performance of our model on attribute inferences with considerably fewer labeled texts.

* 11 pages, 2 figures, uses log_2022.sty

Via

Access Paper or Ask Questions

Adversary for Social Good: Leveraging Adversarial Attacks to Protect Personal Attribute Privacy

Jun 04, 2023

Xiaoting Li, Lingwei Chen, Dinghao Wu

Figure 1 for Adversary for Social Good: Leveraging Adversarial Attacks to Protect Personal Attribute Privacy

Figure 2 for Adversary for Social Good: Leveraging Adversarial Attacks to Protect Personal Attribute Privacy

Figure 3 for Adversary for Social Good: Leveraging Adversarial Attacks to Protect Personal Attribute Privacy

Figure 4 for Adversary for Social Good: Leveraging Adversarial Attacks to Protect Personal Attribute Privacy

Abstract:Social media has drastically reshaped the world that allows billions of people to engage in such interactive environments to conveniently create and share content with the public. Among them, text data (e.g., tweets, blogs) maintains the basic yet important social activities and generates a rich source of user-oriented information. While those explicit sensitive user data like credentials has been significantly protected by all means, personal private attribute (e.g., age, gender, location) disclosure due to inference attacks is somehow challenging to avoid, especially when powerful natural language processing (NLP) techniques have been effectively deployed to automate attribute inferences from implicit text data. This puts users' attribute privacy at risk. To address this challenge, in this paper, we leverage the inherent vulnerability of machine learning to adversarial attacks, and design a novel text-space Adversarial attack for Social Good, called Adv4SG. In other words, we cast the problem of protecting personal attribute privacy as an adversarial attack formulation problem over the social media text data to defend against NLP-based attribute inference attacks. More specifically, Adv4SG proceeds with a sequence of word perturbations under given constraints such that the probed attribute cannot be identified correctly. Different from the prior works, we advance Adv4SG by considering social media property, and introducing cost-effective mechanisms to expedite attribute obfuscation over text data under the black-box setting. Extensive experiments on real-world social media datasets have demonstrated that our method can effectively degrade the inference accuracy with less computational cost over different attribute settings, which substantially helps mitigate the impacts of inference attacks and thus achieve high performance in user attribute privacy protection.

Via

Access Paper or Ask Questions

Mining Themes in Clinical Notes to Identify Phenotypes and to Predict Length of Stay in Patients admitted with Heart Failure

May 30, 2023

Ankita Agarwal, Tanvi Banerjee, William L. Romine, Krishnaprasad Thirunarayan, Lingwei Chen, Mia Cajita

Figure 1 for Mining Themes in Clinical Notes to Identify Phenotypes and to Predict Length of Stay in Patients admitted with Heart Failure

Figure 2 for Mining Themes in Clinical Notes to Identify Phenotypes and to Predict Length of Stay in Patients admitted with Heart Failure

Figure 3 for Mining Themes in Clinical Notes to Identify Phenotypes and to Predict Length of Stay in Patients admitted with Heart Failure

Figure 4 for Mining Themes in Clinical Notes to Identify Phenotypes and to Predict Length of Stay in Patients admitted with Heart Failure

Abstract:Heart failure is a syndrome which occurs when the heart is not able to pump blood and oxygen to support other organs in the body. Identifying the underlying themes in the diagnostic codes and procedure reports of patients admitted for heart failure could reveal the clinical phenotypes associated with heart failure and to group patients based on their similar characteristics which could also help in predicting patient outcomes like length of stay. These clinical phenotypes usually have a probabilistic latent structure and hence, as there has been no previous work on identifying phenotypes in clinical notes of heart failure patients using a probabilistic framework and to predict length of stay of these patients using data-driven artificial intelligence-based methods, we apply natural language processing technique, topic modeling, to identify the themes present in diagnostic codes and in procedure reports of 1,200 patients admitted for heart failure at the University of Illinois Hospital and Health Sciences System (UI Health). Topic modeling identified twelve themes each in diagnostic codes and procedure reports which revealed information about different phenotypes related to various perspectives about heart failure, to study patients' profiles and to discover new relationships among medical concepts. Each theme had a set of keywords and each clinical note was labeled with two themes - one corresponding to its diagnostic code and the other corresponding to its procedure reports along with their percentage contribution. We used these themes and their percentage contribution to predict length of stay. We found that the themes discovered in diagnostic codes and procedure reports using topic modeling together were able to predict length of stay of the patients with an accuracy of 61.1% and an Area under the Receiver Operating Characteristic Curve (ROC AUC) value of 0.828.

* 9 pages, 3 figures, 3 tables, Accepted as a regular full paper at IEEE INTERNATIONAL CONFERENCE ON DIGITAL HEALTH, 2023

Via

Access Paper or Ask Questions

Are Your Explanations Reliable? Investigating the Stability of LIME in Explaining Textual Classification Models via Adversarial Perturbation

May 21, 2023

Christopher Burger, Lingwei Chen, Thai Le

Figure 1 for Are Your Explanations Reliable? Investigating the Stability of LIME in Explaining Textual Classification Models via Adversarial Perturbation

Figure 2 for Are Your Explanations Reliable? Investigating the Stability of LIME in Explaining Textual Classification Models via Adversarial Perturbation

Figure 3 for Are Your Explanations Reliable? Investigating the Stability of LIME in Explaining Textual Classification Models via Adversarial Perturbation

Figure 4 for Are Your Explanations Reliable? Investigating the Stability of LIME in Explaining Textual Classification Models via Adversarial Perturbation

Abstract:Local Surrogate models have increased in popularity for use in explaining complex black-box models for diverse types of data, including text, tabular, and image. One particular algorithm, LIME, continues to see use within the field of machine learning due to its inherently interpretable explanations and model-agnostic behavior. But despite continued use, questions about the stability of LIME persist. Stability, a property where similar instances result in similar explanations, has been shown to be lacking in explanations generated for tabular and image data, both of which are continuous domains. Here we explore the stability of LIME's explanations generated on textual data and confirm the trend of instability shown in previous research for other data types.

* 10 pages, 1 figure

Via

Access Paper or Ask Questions

Higher-order Weighted Graph Convolutional Networks

Nov 12, 2019

Songtao Liu, Lingwei Chen, Hanze Dong, Zihao Wang, Dinghao Wu, Zengfeng Huang

Figure 1 for Higher-order Weighted Graph Convolutional Networks

Figure 2 for Higher-order Weighted Graph Convolutional Networks

Figure 3 for Higher-order Weighted Graph Convolutional Networks

Figure 4 for Higher-order Weighted Graph Convolutional Networks

Abstract:Graph Convolution Network (GCN) has been recognized as one of the most effective graph models for semi-supervised learning, but it extracts merely the first-order or few-order neighborhood information through information propagation, which suffers performance drop-off for deeper structure. Existing approaches that deal with the higher-order neighbors tend to take advantage of adjacency matrix power. In this paper, we assume a seemly trivial condition that the higher-order neighborhood information may be similar to that of the first-order neighbors. Accordingly, we present an unsupervised approach to describe such similarities and learn the weight matrices of higher-order neighbors automatically through Lasso that minimizes the feature loss between the first-order and higher-order neighbors, based on which we formulate the new convolutional filter for GCN to learn the better node representations. Our model, called higher-order weighted GCN(HWGCN), has achieved the state-of-the-art results on a number of node classification tasks over Cora, Citeseer and Pubmed datasets.

* 15 pages

Via

Access Paper or Ask Questions

AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection

Nov 02, 2018

Yanfang Ye, Shifu Hou, Lingwei Chen, Jingwei Lei, Wenqiang Wan, Jiabin Wang, Qi Xiong, Fudong Shao

Figure 1 for AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection

Figure 2 for AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection

Figure 3 for AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection

Figure 4 for AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection

Abstract:The explosive growth and increasing sophistication of Android malware call for new defensive techniques that are capable of protecting mobile users against novel threats. In this paper, we first extract the runtime Application Programming Interface (API) call sequences from Android apps, and then analyze higher-level semantic relations within the ecosystem to comprehensively characterize the apps. To model different types of entities (i.e., app, API, IMEI, signature, affiliation) and the rich semantic relations among them, we then construct a structural heterogeneous information network (HIN) and present meta-path based approach to depict the relatedness over apps. To efficiently classify nodes (e.g., apps) in the constructed HIN, we propose the HinLearning method to first obtain in-sample node embeddings and then learn representations of out-of-sample nodes without rerunning/adjusting HIN embeddings at the first attempt. Afterwards, we design a deep neural network (DNN) classifier taking the learned HIN representations as inputs for Android malware detection. A comprehensive experimental study on the large-scale real sample collections from Tencent Security Lab is performed to compare various baselines. Promising experimental results demonstrate that our developed system AiDroid which integrates our proposed method outperforms others in real-time Android malware detection. AiDroid has already been incorporated into Tencent Mobile Security product that serves millions of users worldwide.

* 8 pages

Via

Access Paper or Ask Questions