Abstract:Real-world graphs naturally exhibit hierarchical or cyclical structures that are unfit for the typical Euclidean space. While there exist graph neural networks that leverage hyperbolic or spherical spaces to learn representations that embed such structures more accurately, these methods are confined under the message-passing paradigm, making the models vulnerable against side-effects such as oversmoothing and oversquashing. More recent work have proposed global attention-based graph Transformers that can easily model long-range interactions, but their extensions towards non-Euclidean geometry are yet unexplored. To bridge this gap, we propose Fully Product-Stereographic Transformer, a generalization of Transformers towards operating entirely on the product of constant curvature spaces. When combined with tokenized graph Transformers, our model can learn the curvature appropriate for the input graph in an end-to-end fashion, without the need of additional tuning on different curvature initializations. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges while respecting the underlying geometry. Experiments on graph reconstruction and node classification demonstrate the benefits of generalizing Transformers to the non-Euclidean domain.
Abstract:We tackle the problem of feature unlearning from a pretrained image generative model. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pretrained generative models. As the target feature is only presented in a local region of an image, unlearning the entire image from the pretrained model may result in losing other details in the remaining region of the image. To specify which features to unlearn, we develop an implicit feedback mechanism where a user can select images containing the target feature. From the implicit feedback, we identify a latent representation corresponding to the target feature and then use the representation to unlearn the generative model. Our framework is generalizable for the two well-known families of generative models: GANs and VAEs. Through experiments on MNIST and CelebA datasets, we show that target features are successfully removed while keeping the fidelity of the original models.
Abstract:We propose a Gaussian manifold variational auto-encoder (GM-VAE) whose latent space consists of a set of diagonal Gaussian distributions. It is known that the set of the diagonal Gaussian distributions with the Fisher information metric forms a product hyperbolic space, which we call a Gaussian manifold. To learn the VAE endowed with the Gaussian manifold, we first propose a pseudo Gaussian manifold normal distribution based on the Kullback-Leibler divergence, a local approximation of the squared Fisher-Rao distance, to define a density over the latent space. With the newly proposed distribution, we introduce geometric transformations at the last and the first of the encoder and the decoder of VAE, respectively to help the transition between the Euclidean and Gaussian manifolds. Through the empirical experiments, we show competitive generalization performance of GM-VAE against other variants of hyperbolic- and Euclidean-VAEs. Our model achieves strong numerical stability, which is a common limitation reported with previous hyperbolic-VAEs.
Abstract:We present a rotated hyperbolic wrapped normal distribution (RoWN), a simple yet effective alteration of a hyperbolic wrapped normal distribution (HWN). The HWN expands the domain of probabilistic modeling from Euclidean to hyperbolic space, where a tree can be embedded with arbitrary low distortion in theory. In this work, we analyze the geometric properties of the diagonal HWN, a standard choice of distribution in probabilistic modeling. The analysis shows that the distribution is inappropriate to represent the data points at the same hierarchy level through their angular distance with the same norm in the Poincar\'e disk model. We then empirically verify the presence of limitations of HWN, and show how RoWN, the newly proposed distribution, can alleviate the limitations on various hierarchical datasets, including noisy synthetic binary tree, WordNet, and Atari 2600 Breakout.
Abstract:Crowdsourcing systems enable us to collect noisy labels from crowd workers. A graphical model representing local dependencies between workers and tasks provides a principled way of reasoning over the true labels from the noisy answers. However, one needs a predictive model working on unseen data directly from crowdsourced datasets instead of the true labels in many cases. To infer true labels and learn a predictive model simultaneously, we propose a new data-generating process, where a neural network generates the true labels from task features. We devise an EM framework alternating variational inference and deep learning to infer the true labels and to update the neural network, respectively. Experimental results with synthetic and real datasets show a belief-propagation-based EM algorithm is robust to i) corruption in task features, ii) multi-modal or mismatched worker prior, and iii) few spammers submitting noises to many tasks.
Abstract:Consistency training, which exploits both supervised and unsupervised learning with different augmentations on image, is an effective method of utilizing unlabeled data in semi-supervised learning (SSL) manner. Here, we present another version of the method with Grad-CAM consistency loss, so it can be utilized in training model with better generalization and adjustability. We show that our method improved the baseline ResNet model with at most 1.44 % and 0.31 $\pm$ 0.59 %p accuracy improvement on average with CIFAR-10 dataset. We conducted ablation study comparing to using only psuedo-label for consistency training. Also, we argue that our method can adjust in different environments when targeted to different units in the model. The code is available: https://github.com/gimme1dollar/gradcam-consistency-semi-sup.
Abstract:With rise of interventional cardiology, Catheter Ablation Therapy (CAT) has established itself as a first-line solution to treat cardiac arrhythmia. Although CAT is a promising technique, cardiologist lacks vision inside the body during the procedure, which may cause serious clinical syndromes. To support accurate clinical procedure, Contact Force Sensing (CFS) system is developed to find a position of the catheter tip through the measure of contact force between catheter and heart tissue. However, the practical usability of commercialized CFS systems is not fully understood due to inaccuracy in the measurement. To support the development of more accurate system, we develop a full pipeline of CFS system with newly collected benchmark dataset through a contact force sensing catheter in simplest hardware form. Our dataset was roughly collected with human noise to increase data diversity. Through the analysis of the dataset, we identify a problem defined as Shift of Reference (SoR), which prevents accurate measurement of contact force. To overcome the problem, we conduct the contact force estimation via standard deep neural networks including for Recurrent Neural Network (RNN), Fully Convolutional Network (FCN) and Transformer. An average error in measurement for RNN, FCN and Transformer are, respectively, 2.46g, 3.03g and 3.01g. Through these studies, we try to lay a groundwork, serve a performance criteria for future CFS system research and open a publicly available dataset to public.