Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Megha Khosla

GNN-MultiFix: Addressing the pitfalls for GNNs for multi-label node classification

Nov 21, 2024

Tianqi Zhao, Megha Khosla

Figure 1 for GNN-MultiFix: Addressing the pitfalls for GNNs for multi-label node classification

Figure 2 for GNN-MultiFix: Addressing the pitfalls for GNNs for multi-label node classification

Figure 3 for GNN-MultiFix: Addressing the pitfalls for GNNs for multi-label node classification

Figure 4 for GNN-MultiFix: Addressing the pitfalls for GNNs for multi-label node classification

Abstract:Graph neural networks (GNNs) have emerged as powerful models for learning representations of graph data showing state of the art results in various tasks. Nevertheless, the superiority of these methods is usually supported by either evaluating their performance on small subset of benchmark datasets or by reasoning about their expressive power in terms of certain graph isomorphism tests. In this paper we critically analyse both these aspects through a transductive setting for the task of node classification. First, we delve deeper into the case of multi-label node classification which offers a more realistic scenario and has been ignored in most of the related works. Through analysing the training dynamics for GNN methods we highlight the failure of GNNs to learn over multi-label graph datasets even for the case of abundant training data. Second, we show that specifically for transductive node classification, even the most expressive GNN may fail to learn in absence of node attributes and without using explicit label information as input. To overcome this deficit, we propose a straightforward approach, referred to as GNN-MultiFix, that integrates the feature, label, and positional information of a node. GNN-MultiFix demonstrates significant improvement across all the multi-label datasets. We release our code at https://anonymous.4open.science/r/Graph-MultiFix-4121.

Via

Access Paper or Ask Questions

Disentangled and Self-Explainable Node Representation Learning

Oct 28, 2024

Simone Piaggesi, André Panisson, Megha Khosla

Figure 1 for Disentangled and Self-Explainable Node Representation Learning

Figure 2 for Disentangled and Self-Explainable Node Representation Learning

Figure 3 for Disentangled and Self-Explainable Node Representation Learning

Figure 4 for Disentangled and Self-Explainable Node Representation Learning

Abstract:Node representations, or embeddings, are low-dimensional vectors that capture node properties, typically learned through unsupervised structural similarity objectives or supervised tasks. While recent efforts have focused on explaining graph model decisions, the interpretability of unsupervised node embeddings remains underexplored. To bridge this gap, we introduce DiSeNE (Disentangled and Self-Explainable Node Embedding), a framework that generates self-explainable embeddings in an unsupervised manner. Our method employs disentangled representation learning to produce dimension-wise interpretable embeddings, where each dimension is aligned with distinct topological structure of the graph. We formalize novel desiderata for disentangled and interpretable embeddings, which drive our new objective functions, optimizing simultaneously for both interpretability and disentanglement. Additionally, we propose several new metrics to evaluate representation quality and human interpretability. Extensive experiments across multiple benchmark datasets demonstrate the effectiveness of our approach.

Via

Access Paper or Ask Questions

A data-centric approach for assessing progress of Graph Neural Networks

Jun 18, 2024

Tianqi Zhao, Ngan Thi Dong, Alan Hanjalic, Megha Khosla

Figure 1 for A data-centric approach for assessing progress of Graph Neural Networks

Figure 2 for A data-centric approach for assessing progress of Graph Neural Networks

Figure 3 for A data-centric approach for assessing progress of Graph Neural Networks

Abstract:Graph Neural Networks (GNNs) have achieved state-of-the-art results in node classification tasks. However, most improvements are in multi-class classification, with less focus on the cases where each node could have multiple labels. The first challenge in studying multi-label node classification is the scarcity of publicly available datasets. To address this, we collected and released three real-world biological datasets and developed a multi-label graph generator with tunable properties. We also argue that traditional notions of homophily and heterophily do not apply well to multi-label scenarios. Therefore, we define homophily and Cross-Class Neighborhood Similarity for multi-label classification and investigate $9$ collected multi-label datasets. Lastly, we conducted a large-scale comparative study with $8$ methods across nine datasets to evaluate current progress in multi-label node classification. We release our code at \url{https://github.com/Tianqi-py/MLGNC}.

* Published in Data-centric Machine Learning Research Worshop @ ICML 2024

Via

Access Paper or Ask Questions

Model Selection with Model Zoo via Graph Learning

Apr 05, 2024

Ziyu Li, Hilco van der Wilk, Danning Zhan, Megha Khosla, Alessandro Bozzon, Rihan Hai

Figure 1 for Model Selection with Model Zoo via Graph Learning

Figure 2 for Model Selection with Model Zoo via Graph Learning

Figure 3 for Model Selection with Model Zoo via Graph Learning

Figure 4 for Model Selection with Model Zoo via Graph Learning

Abstract:Pre-trained deep learning (DL) models are increasingly accessible in public repositories, i.e., model zoos. Given a new prediction task, finding the best model to fine-tune can be computationally intensive and costly, especially when the number of pre-trained models is large. Selecting the right pre-trained models is crucial, yet complicated by the diversity of models from various model families (like ResNet, Vit, Swin) and the hidden relationships between models and datasets. Existing methods, which utilize basic information from models and datasets to compute scores indicating model performance on target datasets, overlook the intrinsic relationships, limiting their effectiveness in model selection. In this study, we introduce TransferGraph, a novel framework that reformulates model selection as a graph learning problem. TransferGraph constructs a graph using extensive metadata extracted from models and datasets, while capturing their inherent relationships. Through comprehensive experiments across 16 real datasets, both images and texts, we demonstrate TransferGraph's effectiveness in capturing essential model-dataset relationships, yielding up to a 32% improvement in correlation between predicted performance and the actual fine-tuning results compared to the state-of-the-art methods.

* Accepted at 40th IEEE International Conference on Data Engineering (ICDE 2024)

Via

Access Paper or Ask Questions

Efficient Neural Ranking using Forward Indexes and Lightweight Encoders

Nov 02, 2023

Jurek Leonhardt, Henrik Müller, Koustav Rudra, Megha Khosla, Abhijit Anand, Avishek Anand

Abstract:Dual-encoder-based dense retrieval models have become the standard in IR. They employ large Transformer-based language models, which are notoriously inefficient in terms of resources and latency. We propose Fast-Forward indexes -- vector forward indexes which exploit the semantic matching capabilities of dual-encoder models for efficient and effective re-ranking. Our framework enables re-ranking at very high retrieval depths and combines the merits of both lexical and semantic matching via score interpolation. Furthermore, in order to mitigate the limitations of dual-encoders, we tackle two main challenges: Firstly, we improve computational efficiency by either pre-computing representations, avoiding unnecessary computations altogether, or reducing the complexity of encoders. This allows us to considerably improve ranking efficiency and latency. Secondly, we optimize the memory footprint and maintenance cost of indexes; we propose two complementary techniques to reduce the index size and show that, by dynamically dropping irrelevant document tokens, the index maintenance efficiency can be improved substantially. We perform evaluation to show the effectiveness and efficiency of Fast-Forward indexes -- our method has low latency and achieves competitive results without the need for hardware acceleration, such as GPUs.

* Accepted at ACM TOIS. arXiv admin note: text overlap with arXiv:2110.06051

Via

Access Paper or Ask Questions

DINE: Dimensional Interpretability of Node Embeddings

Oct 02, 2023

Simone Piaggesi, Megha Khosla, André Panisson, Avishek Anand

Abstract:Graphs are ubiquitous due to their flexibility in representing social and technological systems as networks of interacting elements. Graph representation learning methods, such as node embeddings, are powerful approaches to map nodes into a latent vector space, allowing their use for various graph tasks. Despite their success, only few studies have focused on explaining node embeddings locally. Moreover, global explanations of node embeddings remain unexplored, limiting interpretability and debugging potentials. We address this gap by developing human-understandable explanations for dimensions in node embeddings. Towards that, we first develop new metrics that measure the global interpretability of embedding vectors based on the marginal contribution of the embedding dimensions to predicting graph structure. We say that an embedding dimension is more interpretable if it can faithfully map to an understandable sub-structure in the input graph - like community structure. Having observed that standard node embeddings have low interpretability, we then introduce DINE (Dimension-based Interpretable Node Embedding), a novel approach that can retrofit existing node embeddings by making them more interpretable without sacrificing their task performance. We conduct extensive experiments on synthetic and real-world graphs and show that we can simultaneously learn highly interpretable node embeddings with effective performance in link prediction.

Via

Access Paper or Ask Questions

Does Black-box Attribute Inference Attacks on Graph Neural Networks Constitute Privacy Risk?

Jun 01, 2023

Iyiola E. Olatunji, Anmar Hizber, Oliver Sihlovec, Megha Khosla

Abstract:Graph neural networks (GNNs) have shown promising results on real-life datasets and applications, including healthcare, finance, and education. However, recent studies have shown that GNNs are highly vulnerable to attacks such as membership inference attack and link reconstruction attack. Surprisingly, attribute inference attacks has received little attention. In this paper, we initiate the first investigation into attribute inference attack where an attacker aims to infer the sensitive user attributes based on her public or non-sensitive attributes. We ask the question whether black-box attribute inference attack constitutes a significant privacy risk for graph-structured data and their corresponding GNN model. We take a systematic approach to launch the attacks by varying the adversarial knowledge and assumptions. Our findings reveal that when an attacker has black-box access to the target model, GNNs generally do not reveal significantly more information compared to missing value estimation techniques. Code is available.

Via

Access Paper or Ask Questions

Multi-label Node Classification On Graph-Structured Data

Apr 20, 2023

Tianqi Zhao, Ngan Thi Dong, Alan Hanjalic, Megha Khosla

Abstract:Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, besides defining homophily for the multi-label scenario, we develop a new approach that dynamically fuses the feature and label correlation information to learn label-informed representations. Finally, we perform a large-scale comparative study with $10$ methods and $9$ datasets which also showcase the effectiveness of our approach. We release our benchmark at \url{https://anonymous.4open.science/r/LFLF-5D8C/}.

Via

Access Paper or Ask Questions

Privacy and Transparency in Graph Machine Learning: A Unified Perspective

Jul 22, 2022

Megha Khosla

Figure 1 for Privacy and Transparency in Graph Machine Learning: A Unified Perspective

Figure 2 for Privacy and Transparency in Graph Machine Learning: A Unified Perspective

Figure 3 for Privacy and Transparency in Graph Machine Learning: A Unified Perspective

Abstract:Graph Machine Learning (GraphML), whereby classical machine learning is generalized to irregular graph domains, has enjoyed a recent renaissance, leading to a dizzying array of models and their applications in several domains. With its growing applicability to sensitive domains and regulations by government agencies for trustworthy AI systems, researchers have started looking into the issues of transparency and privacy of graph learning. However, these topics have been mainly investigated independently. In this position paper, we provide a unified perspective on the interplay of privacy and transparency in GraphML.

Via

Access Paper or Ask Questions

Private Graph Extraction via Feature Explanations

Jun 29, 2022

Iyiola E. Olatunji, Mandeep Rathee, Thorben Funke, Megha Khosla

Figure 1 for Private Graph Extraction via Feature Explanations

Figure 2 for Private Graph Extraction via Feature Explanations

Figure 3 for Private Graph Extraction via Feature Explanations

Figure 4 for Private Graph Extraction via Feature Explanations

Abstract:Privacy and interpretability are two of the important ingredients for achieving trustworthy machine learning. We study the interplay of these two aspects in graph machine learning through graph reconstruction attacks. The goal of the adversary here is to reconstruct the graph structure of the training data given access to model explanations. Based on the different kinds of auxiliary information available to the adversary, we propose several graph reconstruction attacks. We show that additional knowledge of post-hoc feature explanations substantially increases the success rate of these attacks. Further, we investigate in detail the differences between attack performance with respect to three different classes of explanation methods for graph neural networks: gradient-based, perturbation-based, and surrogate model-based methods. While gradient-based explanations reveal the most in terms of the graph structure, we find that these explanations do not always score high in utility. For the other two classes of explanations, privacy leakage increases with an increase in explanation utility. Finally, we propose a defense based on a randomized response mechanism for releasing the explanations which substantially reduces the attack success rate. Our anonymized code is available.

Via

Access Paper or Ask Questions