Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ikuro Sato

Rectified Lagrangian for Out-of-Distribution Detection in Modern Hopfield Networks

Feb 19, 2025

Ryo Moriai, Nakamasa Inoue, Masayuki Tanaka, Rei Kawakami, Satoshi Ikehata, Ikuro Sato

Abstract:Modern Hopfield networks (MHNs) have recently gained significant attention in the field of artificial intelligence because they can store and retrieve a large set of patterns with an exponentially large memory capacity. A MHN is generally a dynamical system defined with Lagrangians of memory and feature neurons, where memories associated with in-distribution (ID) samples are represented by attractors in the feature space. One major problem in existing MHNs lies in managing out-of-distribution (OOD) samples because it was originally assumed that all samples are ID samples. To address this, we propose the rectified Lagrangian (RegLag), a new Lagrangian for memory neurons that explicitly incorporates an attractor for OOD samples in the dynamical system of MHNs. RecLag creates a trivial point attractor for any interaction matrix, enabling OOD detection by identifying samples that fall into this attractor as OOD. The interaction matrix is optimized so that the probability densities can be estimated to identify ID/OOD. We demonstrate the effectiveness of RecLag-based MHNs compared to energy-based OOD detection methods, including those using state-of-the-art Hopfield energies, across nine image datasets.

* Accepted to AAAI 2025

Via

Access Paper or Ask Questions

GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields

Oct 27, 2024

Yusuke Sekikawa, Chingwei Hsu, Satoshi Ikehata, Rei Kawakami, Ikuro Sato

Abstract:We propose Gumbel-NeRF, a mixture-of-expert (MoE) neural radiance fields (NeRF) model with a hindsight expert selection mechanism for synthesizing novel views of unseen objects. Previous studies have shown that the MoE structure provides high-quality representations of a given large-scale scene consisting of many objects. However, we observe that such a MoE NeRF model often produces low-quality representations in the vicinity of experts' boundaries when applied to the task of novel view synthesis of an unseen object from one/few-shot input. We find that this deterioration is primarily caused by the foresight expert selection mechanism, which may leave an unnatural discontinuity in the object shape near the experts' boundaries. Gumbel-NeRF adopts a hindsight expert selection mechanism, which guarantees continuity in the density field even near the experts' boundaries. Experiments using the SRN cars dataset demonstrate the superiority of Gumbel-NeRF over the baselines in terms of various image quality metrics.

* 7 pages. Presented at ICIP2024

Via

Access Paper or Ask Questions

Fixed-Weight Difference Target Propagation

Dec 19, 2022

Tatsukichi Shibuya, Nakamasa Inoue, Rei Kawakami, Ikuro Sato

Figure 1 for Fixed-Weight Difference Target Propagation

Figure 2 for Fixed-Weight Difference Target Propagation

Figure 3 for Fixed-Weight Difference Target Propagation

Figure 4 for Fixed-Weight Difference Target Propagation

Abstract:Target Propagation (TP) is a biologically more plausible algorithm than the error backpropagation (BP) to train deep networks, and improving practicality of TP is an open issue. TP methods require the feedforward and feedback networks to form layer-wise autoencoders for propagating the target values generated at the output layer. However, this causes certain drawbacks; e.g., careful hyperparameter tuning is required to synchronize the feedforward and feedback training, and frequent updates of the feedback path are usually required than that of the feedforward path. Learning of the feedforward and feedback networks is sufficient to make TP methods capable of training, but is having these layer-wise autoencoders a necessary condition for TP to work? We answer this question by presenting Fixed-Weight Difference Target Propagation (FW-DTP) that keeps the feedback weights constant during training. We confirmed that this simple method, which naturally resolves the abovementioned problems of TP, can still deliver informative target values to hidden layers for a given task; indeed, FW-DTP consistently achieves higher test performance than a baseline, the Difference Target Propagation (DTP), on four classification datasets. We also present a novel propagation architecture that explains the exact form of the feedback function of DTP to analyze FW-DTP.

* Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23). 9 pages and 3 figures in main manuscript; 11 pages and 5 figures in supplementary material

Via

Access Paper or Ask Questions

Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Nov 18, 2022

Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas

Figure 1 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Figure 2 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Figure 3 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Figure 4 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Abstract:Modern deep learning systems are fragile and do not generalize well under distribution shifts. While much promising work has been accomplished to address these concerns, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address the problem settings for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as out-of-distribution datasets for the exhaustive study. We search over a wide range of hyperparameters and examine the classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings: i) contrary to conventional wisdom, adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum-based SGD), ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset - linear returns, increasing returns, and diminishing returns. We believe these findings can help practitioners choose the right optimizer and know what behavior to expect.

* NeurIPS2022 Workshop on Distribution Shifts (DistShift)

Via

Access Paper or Ask Questions

Informative Sample-Aware Proxy for Deep Metric Learning

Nov 18, 2022

Aoyu Li, Ikuro Sato, Kohta Ishikawa, Rei Kawakami, Rio Yokota

Figure 1 for Informative Sample-Aware Proxy for Deep Metric Learning

Figure 2 for Informative Sample-Aware Proxy for Deep Metric Learning

Figure 3 for Informative Sample-Aware Proxy for Deep Metric Learning

Figure 4 for Informative Sample-Aware Proxy for Deep Metric Learning

Abstract:Among various supervised deep metric learning methods proxy-based approaches have achieved high retrieval accuracies. Proxies, which are class-representative points in an embedding space, receive updates based on proxy-sample similarities in a similar manner to sample representations. In existing methods, a relatively small number of samples can produce large gradient magnitudes (ie, hard samples), and a relatively large number of samples can produce small gradient magnitudes (ie, easy samples); these can play a major part in updates. Assuming that acquiring too much sensitivity to such extreme sets of samples would deteriorate the generalizability of a method, we propose a novel proxy-based method called Informative Sample-Aware Proxy (Proxy-ISA), which directly modifies a gradient weighting factor for each sample using a scheduled threshold function, so that the model is more sensitive to the informative samples. Extensive experiments on the CUB-200-2011, Cars-196, Stanford Online Products and In-shop Clothes Retrieval datasets demonstrate the superiority of Proxy-ISA compared with the state-of-the-art methods.

* Accepted at ACM Multimedia Asia (MMAsia) 2022

Via

Access Paper or Ask Questions

PoF: Post-Training of Feature Extractor for Improving Generalization

Jul 05, 2022

Ikuro Sato, Ryota Yamada, Masayuki Tanaka, Nakamasa Inoue, Rei Kawakami

Figure 1 for PoF: Post-Training of Feature Extractor for Improving Generalization

Figure 2 for PoF: Post-Training of Feature Extractor for Improving Generalization

Figure 3 for PoF: Post-Training of Feature Extractor for Improving Generalization

Figure 4 for PoF: Post-Training of Feature Extractor for Improving Generalization

Abstract:It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model performance against baseline methods on both CIFAR-10 and CIFAR-100 datasets for only 10-epoch post-training, and on SVHN dataset for 50-epoch post-training. Source code is available at: \url{https://github.com/DensoITLab/PoF-v1

* Accepted to ICML2022. Contains a link to the code

Via

Access Paper or Ask Questions

Feature Space Particle Inference for Neural Network Ensembles

Jun 02, 2022

Shingo Yashima, Teppei Suzuki, Kohta Ishikawa, Ikuro Sato, Rei Kawakami

Figure 1 for Feature Space Particle Inference for Neural Network Ensembles

Figure 2 for Feature Space Particle Inference for Neural Network Ensembles

Figure 3 for Feature Space Particle Inference for Neural Network Ensembles

Figure 4 for Feature Space Particle Inference for Neural Network Ensembles

Abstract:Ensembles of deep neural networks demonstrate improved performance over single models. For enhancing the diversity of ensemble members while keeping their performance, particle-based inference methods offer a promising approach from a Bayesian perspective. However, the best way to apply these methods to neural networks is still unclear: seeking samples from the weight-space posterior suffers from inefficiency due to the over-parameterization issues, while seeking samples directly from the function-space posterior often results in serious underfitting. In this study, we propose optimizing particles in the feature space where the activation of a specific intermediate layer lies to address the above-mentioned difficulties. Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness. Extensive evaluation on real-world datasets shows that our model significantly outperforms the gold-standard Deep Ensembles on various metrics, including accuracy, calibration, and robustness. Code is available at https://github.com/DensoITLab/featurePI .

* ICML2022

Via

Access Paper or Ask Questions

Implicit Neural Representations for Variable Length Human Motion Generation

Mar 25, 2022

Pablo Cervantes, Yusuke Sekikawa, Ikuro Sato, Koichi Shinoda

Figure 1 for Implicit Neural Representations for Variable Length Human Motion Generation

Figure 2 for Implicit Neural Representations for Variable Length Human Motion Generation

Figure 3 for Implicit Neural Representations for Variable Length Human Motion Generation

Figure 4 for Implicit Neural Representations for Variable Length Human Motion Generation

Abstract:We propose an action-conditional human motion generation method using variational implicit neural representations (INR). The variational formalism enables action-conditional distributions of INRs, from which one can easily sample representations to generate novel human motion sequences. Our method offers variable-length sequence generation by construction because a part of INR is optimized for a whole sequence of arbitrary length with temporal embeddings. In contrast, previous works reported difficulties with modeling variable-length sequences. We confirm that our method with a Transformer decoder outperforms all relevant methods on HumanAct12, NTU-RGBD, and UESTC datasets in terms of realism and diversity of generated motions. Surprisingly, even our method with an MLP decoder consistently outperforms the state-of-the-art Transformer-based auto-encoder. In particular, we show that variable-length motions generated by our method are better than fixed-length motions generated by the state-of-the-art method in terms of realism and diversity.

Via

Access Paper or Ask Questions

Adversarial Transformations for Semi-Supervised Learning

Nov 18, 2019

Teppei Suzuki, Ikuro Sato

Figure 1 for Adversarial Transformations for Semi-Supervised Learning

Figure 2 for Adversarial Transformations for Semi-Supervised Learning

Figure 3 for Adversarial Transformations for Semi-Supervised Learning

Figure 4 for Adversarial Transformations for Semi-Supervised Learning

Abstract:We propose a Regularization framework based on Adversarial Transformations (RAT) for semi-supervised learning. RAT is designed to enhance robustness of the output distribution of class prediction for a given data against input perturbation. RAT is an extension of Virtual Adversarial Training (VAT) in such a way that RAT adversarialy transforms data along the underlying data distribution by a rich set of data transformation functions that leave class label invariant, whereas VAT simply produces adversarial additive noises. In addition, we verified that a technique of gradually increasing of perturbation region further improve the robustness. In experiments, we show that RAT significantly improves classification performance on CIFAR-10 and SVHN compared to existing regularization methods under standard semi-supervised image classification settings.

* Accepted by AAAI 2020

Via

Access Paper or Ask Questions

Breaking Inter-Layer Co-Adaptation by Classifier Anonymization

Jun 04, 2019

Ikuro Sato, Kohta Ishikawa, Guoqing Liu, Masayuki Tanaka

Figure 1 for Breaking Inter-Layer Co-Adaptation by Classifier Anonymization

Figure 2 for Breaking Inter-Layer Co-Adaptation by Classifier Anonymization

Figure 3 for Breaking Inter-Layer Co-Adaptation by Classifier Anonymization

Figure 4 for Breaking Inter-Layer Co-Adaptation by Classifier Anonymization

Abstract:This study addresses an issue of co-adaptation between a feature extractor and a classifier in a neural network. A naive joint optimization of a feature extractor and a classifier often brings situations in which an excessively complex feature distribution adapted to a very specific classifier degrades the test performance. We introduce a method called Feature-extractor Optimization through Classifier Anonymization (FOCA), which is designed to avoid an explicit co-adaptation between a feature extractor and a particular classifier by using many randomly-generated, weak classifiers during optimization. We put forth a mathematical proposition that states the FOCA features form a point-like distribution within the same class in a class-separable fashion under special conditions. Real-data experiments under more general conditions provide supportive evidences.

* 9 pages. Accepted to ICML 2019

Via

Access Paper or Ask Questions