Abstract:Recent advances in specialized hardware for solving optimization problems such quantum computers, quantum annealers, and CMOS annealers give rise to new ways for solving real-word complex problems. However, given current and near-term hardware limitations, the number of variables required to express a large real-world problem easily exceeds the hardware capabilities, thus hybrid methods are usually developed in order to utilize the hardware. In this work, we advocate for the development of hybrid methods that are built on top of the frameworks of existing state-of-art heuristics, thereby improving these methods. We demonstrate this by building on the so called Louvain method, which is one of the most popular algorithms for the Community detection problem and develop and Ising-based Louvain method. The proposed method outperforms two state-of-the-art community detection algorithms in clustering several small to large-scale graphs. The results show promise in adapting the same optimization approach to other unsupervised learning heuristics to improve their performance.
Abstract:Many fundamental problems in data mining can be reduced to one or more NP-hard combinatorial optimization problems. Recent advances in novel technologies such as quantum and quantum inspired hardware promise a substantial speedup for solving these problems compared to when using general purpose computers but often require the problem to be modeled in a special form, such as an Ising or QUBO model, in order to take advantage of these devices. In this work, we focus on the important binary matrix factorization (BMF) problem which has many applications in data mining. We propose two QUBO formulations for BMF. We show how clustering constraints can easily be incorporated into these formulations. The special purpose hardware we consider is limited in the number of variables it can handle which presents a challenge when factorizing large matrices. We propose a sampling based approach to overcome this challenge, allowing us to factorize large rectangular matrices. We run experiments on the Fujitsu Digital Annealer, a quantum inspired CMOS annealer, on both synthetic and real data, including gene expression data. These experiments show that our approach is able to produce more accurate BMFs than competing methods.
Abstract:The emergence of specialized optimization hardware such as CMOS annealers and adiabatic quantum computers carries the promise of solving hard combinatorial optimization problems more efficiently in hardware. Recent work has focused on formulating different combinatorial optimization problems as Ising models, the core mathematical abstraction used by a large number of these hardware platforms, and evaluating the performance of these models when solved on specialized hardware. An interesting area of application is data mining, where combinatorial optimization problems underlie many core tasks. In this work, we focus on consensus clustering (clustering aggregation), an important combinatorial problem that has received much attention over the last two decades. We present two Ising models for consensus clustering and evaluate them using the Fujitsu Digital Annealer, a quantum-inspired CMOS annealer. Our empirical evaluation shows that our approach outperforms existing techniques and is a promising direction for future research.