Abstract:Clustering, as an unsupervised technique, plays a pivotal role in various data analysis applications. Among clustering algorithms, Spectral Clustering on Euclidean Spaces has been extensively studied. However, with the rapid evolution of data complexity, Euclidean Space is proving to be inefficient for representing and learning algorithms. Although Deep Neural Networks on hyperbolic spaces have gained recent traction, clustering algorithms or non-deep machine learning models on non-Euclidean Spaces remain underexplored. In this paper, we propose a spectral clustering algorithm on Hyperbolic Spaces to address this gap. Hyperbolic Spaces offer advantages in representing complex data structures like hierarchical and tree-like structures, which cannot be embedded efficiently in Euclidean Spaces. Our proposed algorithm replaces the Euclidean Similarity Matrix with an appropriate Hyperbolic Similarity Matrix, demonstrating improved efficiency compared to clustering in Euclidean Spaces. Our contributions include the development of the spectral clustering algorithm on Hyperbolic Spaces and the proof of its weak consistency. We show that our algorithm converges at least as fast as Spectral Clustering on Euclidean Spaces. To illustrate the efficacy of our approach, we present experimental results on the Wisconsin Breast Cancer Dataset, highlighting the superior performance of Hyperbolic Spectral Clustering over its Euclidean counterpart. This work opens up avenues for utilizing non-Euclidean Spaces in clustering algorithms, offering new perspectives for handling complex data structures and improving clustering efficiency.
Abstract:Super-Resolution (SR) is a time-hallowed image processing problem that aims to improve the quality of a Low-Resolution (LR) sample up to the standard of its High-Resolution (HR) counterpart. We aim to address this by introducing Super-Resolution Generator (SuRGe), a fully-convolutional Generative Adversarial Network (GAN)-based architecture for SR. We show that distinct convolutional features obtained at increasing depths of a GAN generator can be optimally combined by a set of learnable convex weights to improve the quality of generated SR samples. In the process, we employ the Jensen-Shannon and the Gromov-Wasserstein losses respectively between the SR-HR and LR-SR pairs of distributions to further aid the generator of SuRGe to better exploit the available information in an attempt to improve SR. Moreover, we train the discriminator of SuRGe with the Wasserstein loss with gradient penalty, to primarily prevent mode collapse. The proposed SuRGe, as an end-to-end GAN workflow tailor-made for super-resolution, offers improved performance while maintaining low inference time. The efficacy of SuRGe is substantiated by its superior performance compared to 18 state-of-the-art contenders on 10 benchmark datasets.
Abstract:Accounting for the uncertainty of value functions boosts exploration in Reinforcement Learning (RL). Our work introduces Maximum Mean Discrepancy Q-Learning (MMD-QL) to improve Wasserstein Q-Learning (WQL) for uncertainty propagation during Temporal Difference (TD) updates. MMD-QL uses the MMD barycenter for this purpose, as MMD provides a tighter estimate of closeness between probability measures than the Wasserstein distance. Firstly, we establish that MMD-QL is Probably Approximately Correct in MDP (PAC-MDP) under the average loss metric. Concerning the accumulated rewards, experiments on tabular environments show that MMD-QL outperforms WQL and other algorithms. Secondly, we incorporate deep networks into MMD-QL to create MMD Q-Network (MMD-QN). Making reasonable assumptions, we analyze the convergence rates of MMD-QN using function approximation. Empirical results on challenging Atari games demonstrate that MMD-QN performs well compared to benchmark deep RL algorithms, highlighting its effectiveness in handling large state-action spaces.
Abstract:Alzheimer's disease (AD), characterized by progressive cognitive decline and memory loss, presents a formidable global health challenge, underscoring the critical importance of early and precise diagnosis for timely interventions and enhanced patient outcomes. While MRI scans provide valuable insights into brain structures, traditional analysis methods often struggle to discern intricate 3D patterns crucial for AD identification. Addressing this challenge, we introduce an alternative end-to-end deep learning model, the 3D Hybrid Compact Convolutional Transformers 3D (HCCT). By synergistically combining convolutional neural networks (CNNs) and vision transformers (ViTs), the 3D HCCT adeptly captures both local features and long-range relationships within 3D MRI scans. Extensive evaluations on prominent AD benchmark dataset, ADNI, demonstrate the 3D HCCT's superior performance, surpassing state of the art CNN and transformer-based methods in classification accuracy. Its robust generalization capability and interpretability marks a significant stride in AD classification from 3D MRI scans, promising more accurate and reliable diagnoses for improved patient care and superior clinical outcomes.
Abstract:The state-of-the-art audio deepfake detectors leveraging deep neural networks exhibit impressive recognition performance. Nonetheless, this advantage is accompanied by a significant carbon footprint. This is mainly due to the use of high-performance computing with accelerators and high training time. Studies show that average deep NLP model produces around 626k lbs of CO\textsubscript{2} which is equivalent to five times of average US car emission at its lifetime. This is certainly a massive threat to the environment. To tackle this challenge, this study presents a novel framework for audio deepfake detection that can be seamlessly trained using standard CPU resources. Our proposed framework utilizes off-the-shelve self-supervised learning (SSL) based models which are pre-trained and available in public repositories. In contrast to existing methods that fine-tune SSL models and employ additional deep neural networks for downstream tasks, we exploit classical machine learning algorithms such as logistic regression and shallow neural networks using the SSL embeddings extracted using the pre-trained model. Our approach shows competitive results compared to the commonly used high-carbon footprint approaches. In experiments with the ASVspoof 2019 LA dataset, we achieve a 0.90\% equal error rate (EER) with less than 1k trainable model parameters. To encourage further research in this direction and support reproducible results, the Python code will be made publicly accessible following acceptance. Github: https://github.com/sahasubhajit/Speech-Spoofing-
Abstract:Graph Transformers (GTs) facilitate the comprehension of graph-structured data by calculating the self-attention of node pairs without considering node position information. To address this limitation, we introduce an innovative and efficient framework that introduces Positional Encodings (PEs) into the Transformer, generating a set of learnable positional encodings in the hyperbolic space, a non-Euclidean domain. This approach empowers us to explore diverse options for optimal selection of PEs for specific downstream tasks, leveraging hyperbolic neural networks or hyperbolic graph convolutional networks. Additionally, we repurpose these positional encodings to mitigate the impact of over-smoothing in deep Graph Neural Networks (GNNs). Comprehensive experiments on molecular benchmark datasets, co-author, and co-purchase networks substantiate the effectiveness of hyperbolic positional encodings in enhancing the performance of deep GNNs.
Abstract:Variational Autoencoders (VAEs) have been a pioneering force in the realm of deep generative models. Amongst its legions of progenies, Wasserstein Autoencoders (WAEs) stand out in particular due to the dual offering of heightened generative quality and a strong theoretical backbone. WAEs consist of an encoding and a decoding network forming a bottleneck with the prime objective of generating new samples resembling the ones it was catered to. In the process, they aim to achieve a target latent representation of the encoded data. Our work is an attempt to offer a theoretical understanding of the machinery behind WAEs. From a statistical viewpoint, we pose the problem as concurrent density estimation tasks based on neural network-induced transformations. This allows us to establish deterministic upper bounds on the realized errors WAEs commit. We also analyze the propagation of these stochastic errors in the presence of adversaries. As a result, both the large sample properties of the reconstructed distribution and the resilience of WAE models are explored.
Abstract:Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These encompass a heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When confronted with data containing noisy or outlier-laden observations, the Median-of-Means (MoM) estimator emerges as a stabilizing force for any centroid-based clustering framework. On a different note, a prevalent constraint among existing clustering methodologies resides in the prerequisite knowledge of the number of clusters prior to analysis. Utilizing model-based methodologies, such as Bayesian nonparametric models, offers the advantage of infinite mixture models, thereby circumventing the need for such requirements. Motivated by these facts, in this article, we present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies that mitigates the effect of noise on the quality of clustering while ensuring that the number of clusters need not be specified in advance. Statistical guarantees on the upper bound of clustering error, and rigorous assessment through simulated and real datasets suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
Abstract:Evolutionary algorithms (EA), a class of stochastic search methods based on the principles of natural evolution, have received widespread acclaim for their exceptional performance in various real-world optimization problems. While researchers worldwide have proposed a wide variety of EAs, certain limitations remain, such as slow convergence speed and poor generalization capabilities. Consequently, numerous scholars actively explore improvements to algorithmic structures, operators, search patterns, etc., to enhance their optimization performance. Reinforcement learning (RL) integrated as a component in the EA framework has demonstrated superior performance in recent years. This paper presents a comprehensive survey on integrating reinforcement learning into the evolutionary algorithm, referred to as reinforcement learning-assisted evolutionary algorithm (RL-EA). We begin with the conceptual outlines of reinforcement learning and the evolutionary algorithm. We then provide a taxonomy of RL-EA. Subsequently, we discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature. The RL-assisted procedure is divided according to the implemented functions including solution generation, learnable objective function, algorithm/operator/sub-population selection, parameter adaptation, and other strategies. Finally, we analyze potential directions for future research. This survey serves as a rich resource for researchers interested in RL-EA as it overviews the current state-of-the-art and highlights the associated challenges. By leveraging this survey, readers can swiftly gain insights into RL-EA to develop efficient algorithms, thereby fostering further advancements in this emerging field.
Abstract:In machine learning and computer vision, mean shift (MS) qualifies as one of the most popular mode-seeking algorithms used for clustering and image segmentation. It iteratively moves each data point to the weighted mean of its neighborhood data points. The computational cost required to find the neighbors of each data point is quadratic to the number of data points. Consequently, the vanilla MS appears to be very slow for large-scale datasets. To address this issue, we propose a mode-seeking algorithm called GridShift, with significant speedup and principally based on MS. To accelerate, GridShift employs a grid-based approach for neighbor search, which is linear in the number of data points. In addition, GridShift moves the active grid cells (grid cells associated with at least one data point) in place of data points towards the higher density, a step that provides more speedup. The runtime of GridShift is linear in the number of active grid cells and exponential in the number of features. Therefore, it is ideal for large-scale low-dimensional applications such as object tracking and image segmentation. Through extensive experiments, we showcase the superior performance of GridShift compared to other MS-based as well as state-of-the-art algorithms in terms of accuracy and runtime on benchmark datasets for image segmentation. Finally, we provide a new object-tracking algorithm based on GridShift and show promising results for object tracking compared to CamShift and meanshift++.