Abstract:Tensor decompositions play a crucial role in numerous applications related to multi-way data analysis. By employing a Bayesian framework with sparsity-inducing priors, Bayesian Tensor Ring (BTR) factorization offers probabilistic estimates and an effective approach for automatically adapting the tensor ring rank during the learning process. However, previous BTR method employs an Automatic Relevance Determination (ARD) prior, which can lead to sub-optimal solutions. Besides, it solely focuses on continuous data, whereas many applications involve discrete data. More importantly, it relies on the Coordinate-Ascent Variational Inference (CAVI) algorithm, which is inadequate for handling large tensors with extensive observations. These limitations greatly limit its application scales and scopes, making it suitable only for small-scale problems, such as image/video completion. To address these issues, we propose a novel BTR model that incorporates a nonparametric Multiplicative Gamma Process (MGP) prior, known for its superior accuracy in identifying latent structures. To handle discrete data, we introduce the P\'olya-Gamma augmentation for closed-form updates. Furthermore, we develop an efficient Gibbs sampler for consistent posterior simulation, which reduces the computational complexity of previous VI algorithm by two orders, and an online EM algorithm that is scalable to extremely large tensors. To showcase the advantages of our model, we conduct extensive experiments on both simulation data and real-world applications.
Abstract:This paper introduces a novel family of generalized exponentiated gradient (EG) updates derived from an Alpha-Beta divergence regularization function. Collectively referred to as EGAB, the proposed updates belong to the category of multiplicative gradient algorithms for positive data and demonstrate considerable flexibility by controlling iteration behavior and performance through three hyperparameters: $\alpha$, $\beta$, and the learning rate $\eta$. To enforce a unit $l_1$ norm constraint for nonnegative weight vectors within generalized EGAB algorithms, we develop two slightly distinct approaches. One method exploits scale-invariant loss functions, while the other relies on gradient projections onto the feasible domain. As an illustration of their applicability, we evaluate the proposed updates in addressing the online portfolio selection problem (OLPS) using gradient-based methods. Here, they not only offer a unified perspective on the search directions of various OLPS algorithms (including the standard exponentiated gradient and diverse mean-reversion strategies), but also facilitate smooth interpolation and extension of these updates due to the flexibility in hyperparameter selection. Simulation results confirm that the adaptability of these generalized gradient updates can effectively enhance the performance for some portfolios, particularly in scenarios involving transaction costs.
Abstract:Automated diagnosis with artificial intelligence has emerged as a promising area in the realm of medical imaging, while the interpretability of the introduced deep neural networks still remains an urgent concern. Although contemporary works, such as XProtoNet and MProtoNet, has sought to design interpretable prediction models for the issue, the localization precision of their resulting attribution maps can be further improved. To this end, we propose a Multi-scale Attentive Prototypical part Network, termed MAProtoNet, to provide more precise maps for attribution. Specifically, we introduce a concise multi-scale module to merge attentive features from quadruplet attention layers, and produces attribution maps. The proposed quadruplet attention layers can enhance the existing online class activation mapping loss via capturing interactions between the spatial and channel dimension, while the multi-scale module then fuses both fine-grained and coarse-grained information for precise maps generation. We also apply a novel multi-scale mapping loss for supervision on the proposed multi-scale module. Compared to existing interpretable prototypical part networks in medical imaging, MAProtoNet can achieve state-of-the-art performance in localization on brain tumor segmentation (BraTS) datasets, resulting in approximately 4% overall improvement on activation precision score (with a best score of 85.8%), without using additional annotated labels of segmentation. Our code will be released in https://github.com/TUAT-Novice/maprotonet.
Abstract:Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different from the original input image after the reverse process, leading to degraded standard accuracy. To overcome these issues, a natural idea is to harness adversarial training strategy to retrain or fine-tune the pre-trained diffusion model, which is computationally prohibitive. We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs and avoids retraining or fine-tuning the DMs. This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs for the first time, which also provides DM-based AP an efficient adaptive ability to new attacks. Extensive experiments are conducted to demonstrate that our method achieves the state-of-the-art results and exhibits generalization against different attacks.
Abstract:The deep neural networks are known to be vulnerable to well-designed adversarial attacks. The most successful defense technique based on adversarial training (AT) can achieve optimal robustness against particular attacks but cannot generalize well to unseen attacks. Another effective defense technique based on adversarial purification (AP) can enhance generalization but cannot achieve optimal robustness. Meanwhile, both methods share one common limitation on the degraded standard accuracy. To mitigate these issues, we propose a novel framework called Adversarial Training on Purification (AToP), which comprises two components: perturbation destruction by random transforms (RT) and purifier model fine-tuned (FT) by adversarial loss. RT is essential to avoid overlearning to known attacks resulting in the robustness generalization to unseen attacks and FT is essential for the improvement of robustness. To evaluate our method in an efficient and scalable way, we conduct extensive experiments on CIFAR-10, CIFAR-100, and ImageNette to demonstrate that our method achieves state-of-the-art results and exhibits generalization ability against unseen attacks.
Abstract:In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-linear structures, such as CP and Tucker formats. Therefore, they may not be effective enough to handle complex real-world datasets. To address these issues, we propose ENTED, an \underline{E}fficient \underline{N}onparametric \underline{TE}nsor \underline{D}ecomposition for binary and count tensors. Specifically, we first employ a nonparametric Gaussian process (GP) to replace traditional multi-linear structures. Next, we utilize the \pg augmentation which provides a unified framework to establish conjugate models for binary and count distributions. Finally, to address the computational issue of GPs, we enhance the model by incorporating sparse orthogonal variational inference of inducing points, which offers a more effective covariance approximation within GPs and stochastic natural gradient updates for nonparametric models. We evaluate our model on several real-world tensor completion tasks, considering binary and count datasets. The results manifest both better performance and computational advantages of the proposed model.
Abstract:With large training datasets and massive amounts of computing sources, large language models (LLMs) achieve remarkable performance in comprehensive and generative ability. Based on those powerful LLMs, the model fine-tuned with domain-specific datasets posseses more specialized knowledge and thus is more practical like medical LLMs. However, the existing fine-tuned medical LLMs are limited to general medical knowledge with English language. For disease-specific problems, the model's response is inaccurate and sometimes even completely irrelevant, especially when using a language other than English. In this work, we focus on the particular disease of Epilepsy with Japanese language and introduce a customized LLM termed as EpilepsyLLM. Our model is trained from the pre-trained LLM by fine-tuning technique using datasets from the epilepsy domain. The datasets contain knowledge of basic information about disease, common treatment methods and drugs, and important notes in life and work. The experimental results demonstrate that EpilepsyLLM can provide more reliable and specialized medical knowledge responses.
Abstract:Electroencephalography (EEG) is essential for the diagnosis of epilepsy, but it requires expertise and experience to identify abnormalities. It is thus crucial to develop automated models for the detection of abnormal EEGs related to epilepsy. This paper describes the development of a novel class of compact and efficient convolutional neural networks (CNNs) for detecting abnormal time intervals and electrodes in EEGs for epilepsy. The designed model is inspired by a CNN developed for brain-computer interfacing called multichannel EEGNet (mEEGNet). Unlike the EEGNet, the proposed model, mEEGNet, has the same number of electrode inputs and outputs to detect abnormalities. The mEEGNet was evaluated with a clinical dataset consisting of 29 cases of juvenile and childhood absence epilepsy labeled by a clinical expert. The labels were given to paroxysmal discharges visually observed in both ictal (seizure) and interictal (nonseizure) intervals. Results showed that the mEEGNet detected abnormal EEGs with the area under the curve, F1-values, and sensitivity equivalent to or higher than those of existing CNNs. Moreover, the number of parameters is much smaller than other CNN models. To our knowledge, the dataset of absence epilepsy validated with machine learning through this research is the largest in the literature.
Abstract:A yuru-chara is a mascot character created by local governments and companies for publicizing information on areas and products. Because it takes various costs to create a yuruchara, the utilization of machine learning techniques such as generative adversarial networks (GANs) can be expected. In recent years, it has been reported that the use of class conditions in a dataset for GANs training stabilizes learning and improves the quality of the generated images. However, it is difficult to apply class conditional GANs when the amount of original data is small and when a clear class is not given, such as a yuruchara image. In this paper, we propose a class conditional GAN based on clustering and data augmentation. Specifically, first, we performed clustering based on K-means++ on the yuru-chara image dataset and converted it into a class conditional dataset. Next, data augmentation was performed on the class conditional dataset so that the amount of data was increased five times. In addition, we built a model that incorporates ResBlock and self-attention into a network based on class conditional GAN and trained the class conditional yuru-chara dataset. As a result of evaluating the generated images, the effect on the generated images by the difference of the clustering method was confirmed.
Abstract:The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and data-driven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed least-square-type rule to minimize the estimation error. The main contribution of this paper is to establish update rules for precision matrices on the SPD manifold in order to keep their symmetric positive-definiteness. Different from conventional kernel adaptive filters, the proposed regressor is a superposition of Gaussian kernels with all different parameters, which makes such regressor more flexible. The kernel adaptive filtering algorithm is established together with a l1-regularized least squares to avoid overfitting and the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method.