Abstract:Fidelity quantum kernels have shown promise in classification tasks, particularly when a group structure in the data can be identified and exploited through a covariant feature map. In fact, there exist classification problems on which covariant kernels provide a provable advantage, thus establishing a separation between quantum and classical learners. However, their practical application poses two challenges: on one side, the group structure may be unknown and approximate in real-world data, and on the other side, scaling to the `utility' regime (above 100 qubits) is affected by exponential concentration. In this work, we address said challenges by applying fidelity kernels to real-world data with unknown structure, related to the scheduling of a fleet of electric vehicles, and to synthetic data generated from the union of subspaces, which is then close to many relevant real-world datasets. Furthermore, we propose a novel error mitigation strategy specifically tailored for fidelity kernels, called Bit Flip Tolerance (BFT), to alleviate the exponential concentration in our utility-scale experiments. Our multiclass classification reaches accuracies comparable to classical SVCs up to 156 qubits, thus constituting the largest experimental demonstration of quantum machine learning on IBM devices to date. For the real-world data experiments, the effect of the proposed BFT becomes manifest on 40+ qubits, where mitigated accuracies reach 80%, in line with classical, compared to 33% without BFT. Through the union-of-subspace synthetic dataset with 156 qubits, we demonstrate a mitigated accuracy of 80%, compared to 83% of classical models, and 37% of unmitigated quantum, using a test set of limited size.
Abstract:Graph Neural Networks (GNNs) have gained popularity in healthcare and other domains due to their ability to process multi-modal and multi-relational graphs. However, efficient training of GNNs remains challenging, with several open research questions. In this paper, we investigate how the flow of embedding information within GNNs affects the prediction of links in Knowledge Graphs (KGs). Specifically, we propose a mathematical model that decouples the GNN connectivity from the connectivity of the graph data and evaluate the performance of GNNs in a clinical triage use case. Our results demonstrate that incorporating domain knowledge into the GNN connectivity leads to better performance than using the same connectivity as the KG or allowing unconstrained embedding propagation. Moreover, we show that negative edges play a crucial role in achieving good predictions, and that using too many GNN layers can degrade performance.
Abstract:Recent research in representation learning utilizes large databases of proteins or molecules to acquire knowledge of drug and protein structures through unsupervised learning techniques. These pre-trained representations have proven to significantly enhance the accuracy of subsequent tasks, such as predicting the affinity between drugs and target proteins. In this study, we demonstrate that by incorporating knowledge graphs from diverse sources and modalities into the sequences or SMILES representation, we can further enrich the representation and achieve state-of-the-art results on established benchmark datasets. We provide preprocessed and integrated data obtained from 7 public sources, which encompass over 30M triples. Additionally, we make available the pre-trained models based on this data, along with the reported outcomes of their performance on three widely-used benchmark datasets for drug-target binding affinity prediction found in the Therapeutic Data Commons (TDC) benchmarks. Additionally, we make the source code for training models on benchmark datasets publicly available. Our objective in releasing these pre-trained models, accompanied by clean data for model pretraining and benchmark results, is to encourage research in knowledge-enhanced representation learning.