Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vikash Singh

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

May 26, 2025

Debargha Ganguly, Vikash Singh, Sreehari Sankar, Biyao Zhang, Xuecen Zhang, Srinivasan Iyengar, Xiaotian Han, Amit Sharma, Shivkumar Kalyanaraman, Vipin Chaudhary

Abstract:Large language models (LLMs) show remarkable promise for democratizing automated reasoning by generating formal specifications. However, a fundamental tension exists: LLMs are probabilistic, while formal verification demands deterministic guarantees. This paper addresses this epistemological gap by comprehensively investigating failure modes and uncertainty quantification (UQ) in LLM-generated formal artifacts. Our systematic evaluation of five frontier LLMs reveals Satisfiability Modulo Theories (SMT) based autoformalization's domain-specific impact on accuracy (from +34.8% on logical tasks to -44.5% on factual ones), with known UQ techniques like the entropy of token probabilities failing to identify these errors. We introduce a probabilistic context-free grammar (PCFG) framework to model LLM outputs, yielding a refined uncertainty taxonomy. We find uncertainty signals are task-dependent (e.g., grammar entropy for logic, AUROC>0.93). Finally, a lightweight fusion of these signals enables selective verification, drastically reducing errors (14-100%) with minimal abstention, transforming LLM-driven formalization into a reliable engineering discipline.

Via

Access Paper or Ask Questions

Bayesian Binary Search

Oct 02, 2024

Vikash Singh, Matthew Khanzadeh, Vincent Davis, Harrison Rush, Emanuele Rossi, Jesse Shrader, Pietro Lio

Abstract:We present Bayesian Binary Search (BBS), a novel probabilistic variant of the classical binary search/bisection algorithm. BBS leverages machine learning/statistical techniques to estimate the probability density of the search space and modifies the bisection step to split based on probability density rather than the traditional midpoint, allowing for the learned distribution of the search space to guide the search algorithm. Search space density estimation can flexibly be performed using supervised probabilistic machine learning techniques (e.g., Gaussian process regression, Bayesian neural networks, quantile regression) or unsupervised learning algorithms (e.g., Gaussian mixture models, kernel density estimation (KDE), maximum likelihood estimation (MLE)). We demonstrate significant efficiency gains of using BBS on both simulated data across a variety of distributions and in a real-world binary search use case of probing channel balances in the Bitcoin Lightning Network, for which we have deployed the BBS algorithm in a production setting.

Via

Access Paper or Ask Questions

Channel Balance Interpolation in the Lightning Network via Machine Learning

May 20, 2024

Vincent, Emanuele Rossi, Vikash Singh

Figure 1 for Channel Balance Interpolation in the Lightning Network via Machine Learning

Figure 2 for Channel Balance Interpolation in the Lightning Network via Machine Learning

Figure 3 for Channel Balance Interpolation in the Lightning Network via Machine Learning

Figure 4 for Channel Balance Interpolation in the Lightning Network via Machine Learning

Abstract:The Bitcoin Lightning Network is a Layer 2 payment protocol that addresses Bitcoin's scalability by facilitating quick and cost effective transactions through payment channels. This research explores the feasibility of using machine learning models to interpolate channel balances within the network, which can be used for optimizing the network's pathfinding algorithms. While there has been much exploration in balance probing and multipath payment protocols, predicting channel balances using solely node and channel features remains an uncharted area. This paper evaluates the performance of several machine learning models against two heuristic baselines and investigates the predictive capabilities of various features. Our model performs favorably in experimental evaluation, outperforming by 10% against an equal split baseline where both edges are assigned half of the channel capacity.

Via

Access Paper or Ask Questions

GCExplainer: Human-in-the-Loop Concept-based Explanations for Graph Neural Networks

Jul 25, 2021

Lucie Charlotte Magister, Dmitry Kazhdan, Vikash Singh, Pietro Liò

Figure 1 for GCExplainer: Human-in-the-Loop Concept-based Explanations for Graph Neural Networks

Figure 2 for GCExplainer: Human-in-the-Loop Concept-based Explanations for Graph Neural Networks

Figure 3 for GCExplainer: Human-in-the-Loop Concept-based Explanations for Graph Neural Networks

Figure 4 for GCExplainer: Human-in-the-Loop Concept-based Explanations for Graph Neural Networks

Abstract:While graph neural networks (GNNs) have been shown to perform well on graph-based data from a variety of fields, they suffer from a lack of transparency and accountability, which hinders trust and consequently the deployment of such models in high-stake and safety-critical scenarios. Even though recent research has investigated methods for explaining GNNs, these methods are limited to single-instance explanations, also known as local explanations. Motivated by the aim of providing global explanations, we adapt the well-known Automated Concept-based Explanation approach (Ghorbani et al., 2019) to GNN node and graph classification, and propose GCExplainer. GCExplainer is an unsupervised approach for post-hoc discovery and extraction of global concept-based explanations for GNNs, which puts the human in the loop. We demonstrate the success of our technique on five node classification datasets and two graph classification datasets, showing that we are able to discover and extract high-quality concept representations by putting the human in the loop. We achieve a maximum completeness score of 1 and an average completeness score of 0.753 across the datasets. Finally, we show that the concept-based explanations provide an improved insight into the datasets and GNN models compared to the state-of-the-art explanations produced by GNNExplainer (Ying et al., 2019).

* Accepted as 3rd ICML Workshop on Human in the Loop Learning, 2021

Via

Access Paper or Ask Questions

Graph Representation Learning on Tissue-Specific Multi-Omics

Jul 25, 2021

Amine Amor, Pietro Lio', Vikash Singh, Ramon Viñas Torné, Helena Andres Terre

Figure 1 for Graph Representation Learning on Tissue-Specific Multi-Omics

Abstract:Combining different modalities of data from human tissues has been critical in advancing biomedical research and personalised medical care. In this study, we leverage a graph embedding model (i.e VGAE) to perform link prediction on tissue-specific Gene-Gene Interaction (GGI) networks. Through ablation experiments, we prove that the combination of multiple biological modalities (i.e multi-omics) leads to powerful embeddings and better link prediction performances. Our evaluation shows that the integration of gene methylation profiles and RNA-sequencing data significantly improves the link prediction performance. Overall, the combination of RNA-sequencing and gene methylation data leads to a link prediction accuracy of 71% on GGI networks. By harnessing graph representation learning on multi-omics data, our work brings novel insights to the current literature on multi-omics integration in bioinformatics.

* This paper was accepted at the 2021 ICML Workshop on Computational Biology

Via

Access Paper or Ask Questions

Quantum Graph Neural Networks

Sep 26, 2019

Guillaume Verdon, Trevor McCourt, Enxhell Luzhnica, Vikash Singh, Stefan Leichenauer, Jack Hidary

Figure 1 for Quantum Graph Neural Networks

Figure 2 for Quantum Graph Neural Networks

Figure 3 for Quantum Graph Neural Networks

Figure 4 for Quantum Graph Neural Networks

Abstract:We introduce Quantum Graph Neural Networks (QGNN), a new class of quantum neural network ansatze which are tailored to represent quantum processes which have a graph structure, and are particularly suitable to be executed on distributed quantum systems over a quantum network. Along with this general class of ansatze, we introduce further specialized architectures, namely, Quantum Graph Recurrent Neural Networks (QGRNN) and Quantum Graph Convolutional Neural Networks (QGCNN). We provide four example applications of QGNNs: learning Hamiltonian dynamics of quantum systems, learning how to create multipartite entanglement in a quantum network, unsupervised learning for spectral clustering, and supervised learning for graph isomorphism classification.

* 8 pages

Via

Access Paper or Ask Questions

Towards Probabilistic Generative Models Harnessing Graph Neural Networks for Disease-Gene Prediction

Jul 12, 2019

Vikash Singh, Pietro Lio'

Figure 1 for Towards Probabilistic Generative Models Harnessing Graph Neural Networks for Disease-Gene Prediction

Figure 2 for Towards Probabilistic Generative Models Harnessing Graph Neural Networks for Disease-Gene Prediction

Abstract:Disease-gene prediction (DGP) refers to the computational challenge of predicting associations between genes and diseases. Effective solutions to the DGP problem have the potential to accelerate the therapeutic development pipeline at early stages via efficient prioritization of candidate genes for various diseases. In this work, we introduce the variational graph auto-encoder (VGAE) as a promising unsupervised approach for learning powerful latent embeddings in disease-gene networks that can be used for the DGP problem, the first approach using a generative model involving graph neural networks (GNNs). In addition to introducing the VGAE as a promising approach to the DGP problem, we further propose an extension (constrained-VGAE or C-VGAE) which adapts the learning algorithm for link prediction between two distinct node types in heterogeneous graphs. We evaluate and demonstrate the effectiveness of the VGAE on general link prediction in a disease-gene association network and the C-VGAE on disease-gene prediction in the same network, using popular random walk driven methods as baselines. While the methodology presented demonstrates potential solely based on utilizing the topology of a disease-gene association network, it can be further enhanced and explored through the integration of additional biological networks such as gene/protein interaction networks and additional biological features pertaining to the diseases and genes represented in the disease-gene association network.

* Workshop on Computational Biology (WCB) at ICML 2019

Via

Access Paper or Ask Questions