Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kazuhiko Kawamoto

Cross-Model Transfer of Task Vectors via Few-Shot Orthogonal Alignment

May 17, 2025

Kazuhiko Kawamoto, Atsuhiro Endo, Hiroshi Kera

Abstract:Task arithmetic enables efficient model editing by representing task-specific changes as vectors in parameter space. Task arithmetic typically assumes that the source and target models are initialized from the same pre-trained parameters. This assumption limits its applicability in cross-model transfer settings, where models are independently pre-trained on different datasets. To address this challenge, we propose a method based on few-shot orthogonal alignment, which aligns task vectors to the parameter space of a differently pre-trained target model. These transformations preserve key properties of task vectors, such as norm and rank, and are learned using only a small number of labeled examples. We evaluate the method using two Vision Transformers pre-trained on YFCC100M and LAION400M, and test on eight classification datasets. Experimental results show that our method improves transfer accuracy over direct task vector application and achieves performance comparable to few-shot fine-tuning, while maintaining the modularity and reusability of task vectors. Our code is available at https://github.com/kawakera-lab/CrossModelTransfer.

* 8 pages

Via

Access Paper or Ask Questions

Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution

Mar 04, 2025

Ru Ito, Supatta Viriyavisuthisakul, Kazuhiko Kawamoto, Hiroshi Kera

Figure 1 for Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution

Figure 2 for Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution

Figure 3 for Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution

Figure 4 for Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution

Abstract:Most super-resolution (SR) models struggle with real-world low-resolution (LR) images. This issue arises because the degradation characteristics in the synthetic datasets differ from those in real-world LR images. Since SR models are trained on pairs of high-resolution (HR) and LR images generated by downsampling, they are optimized for simple degradation. However, real-world LR images contain complex degradation caused by factors such as the imaging process and JPEG compression. Due to these differences in degradation characteristics, most SR models perform poorly on real-world LR images. This study proposes a dataset generation method using undertrained image reconstruction models. These models have the property of reconstructing low-quality images with diverse degradation from input images. By leveraging this property, this study generates LR images with diverse degradation from HR images to construct the datasets. Fine-tuning pre-trained SR models on our generated datasets improves noise removal and blur reduction, enhancing performance on real-world LR images. Furthermore, an analysis of the datasets reveals that degradation diversity contributes to performance improvements, whereas color differences between HR and LR images may degrade performance. 11 pages, (11 figures and 2 tables)

* 11 pages, 11 figures, 2 tables

Via

Access Paper or Ask Questions

Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations

Dec 25, 2024

Shingo Ayabe, Takuto Otomo, Hiroshi Kera, Kazuhiko Kawamoto

Abstract:Offline reinforcement learning, which learns solely from datasets without environmental interaction, has gained attention. This approach, similar to traditional online deep reinforcement learning, is particularly promising for robot control applications. Nevertheless, its robustness against real-world challenges, such as joint actuator faults in robots, remains a critical concern. This study evaluates the robustness of existing offline reinforcement learning methods using legged robots from OpenAI Gym based on average episodic rewards. For robustness evaluation, we simulate failures by incorporating both random and adversarial perturbations, representing worst-case scenarios, into the joint torque signals. Our experiments show that existing offline reinforcement learning methods exhibit significant vulnerabilities to these action perturbations and are more vulnerable than online reinforcement learning methods, highlighting the need for more robust approaches in this field.

* 12 pages, 2 figures

Via

Access Paper or Ask Questions

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning

Dec 24, 2024

Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto

Abstract:We propose Adapter Merging with Centroid Prototype Mapping (ACMap), an exemplar-free framework for class-incremental learning (CIL) that addresses both catastrophic forgetting and scalability. While existing methods trade-off between inference time and accuracy, ACMap consolidates task-specific adapters into a single adapter, ensuring constant inference time across tasks without compromising accuracy. The framework employs adapter merging to build a shared subspace that aligns task representations and mitigates forgetting, while centroid prototype mapping maintains high accuracy through consistent adaptation in the shared subspace. To further improve scalability, an early stopping strategy limits adapter merging as tasks increase. Extensive experiments on five benchmark datasets demonstrate that ACMap matches state-of-the-art accuracy while maintaining inference time comparable to the fastest existing methods. The code is available at https://github.com/tf63/ACMap

* 11 pages (main text), 6 pages (supplementary material)

Via

Access Paper or Ask Questions

Explaining Object Detectors via Collective Contribution of Pixels

Dec 01, 2024

Toshinori Yamauchi, Hiroshi Kera, Kazuhiko Kawamoto

Abstract:Visual explanations for object detectors are crucial for enhancing their reliability. Since object detectors identify and localize instances by assessing multiple features collectively, generating explanations that capture these collective contributions is critical. However, existing methods focus solely on individual pixel contributions, ignoring the collective contribution of multiple pixels. To address this, we proposed a method for object detectors that considers the collective contribution of multiple pixels. Our approach leverages game-theoretic concepts, specifically Shapley values and interactions, to provide explanations. These explanations cover both bounding box generation and class determination, considering both individual and collective pixel contributions. Extensive quantitative and qualitative experiments demonstrate that the proposed method more accurately identifies important regions in detection results compared to current state-of-the-art methods. The code will be publicly available soon.

* 11+14 pages, 15 figures, 8 tables

Via

Access Paper or Ask Questions

Low-Quality Image Detection by Hierarchical VAE

Aug 20, 2024

Tomoyasu Nanaumi, Kazuhiko Kawamoto, Hiroshi Kera

Abstract:To make an employee roster, photo album, or training dataset of generative models, one needs to collect high-quality images while dismissing low-quality ones. This study addresses a new task of unsupervised detection of low-quality images. We propose a method that not only detects low-quality images with various types of degradation but also provides visual clues of them based on an observation that partial reconstruction by hierarchical variational autoencoders fails for low-quality images. The experiments show that our method outperforms several unsupervised out-of-distribution detection methods and also gives visual clues for low-quality images that help humans recognize them even in thumbnail view.

* ICCV 2023, Workshop on Uncertainty Estimation for Computer Vision

Via

Access Paper or Ask Questions

VarteX: Enhancing Weather Forecast through Distributed Variable Representation

Jun 28, 2024

Ayumu Ueyama, Kazuhiko Kawamoto, Hiroshi Kera

Abstract:Weather forecasting is essential for various human activities. Recent data-driven models have outperformed numerical weather prediction by utilizing deep learning in forecasting performance. However, challenges remain in efficiently handling multiple meteorological variables. This study proposes a new variable aggregation scheme and an efficient learning framework for that challenge. Experiments show that VarteX outperforms the conventional model in forecast performance, requiring significantly fewer parameters and resources. The effectiveness of learning through multiple aggregations and regional split training is demonstrated, enabling more efficient and accurate deep learning-based weather forecasting.

* ICML 2024, Workshop on Machine Learning for Earth System Modeling

Via

Access Paper or Ask Questions

Matching Non-Identical Objects

Mar 18, 2024

Yusuke Marumo, Kazuhiko Kawamoto, Hiroshi Kera

Abstract:Not identical but similar objects are everywhere in the world. Examples include four-legged animals such as dogs and cats, cars of different models, akin flowers in various colors, and countless others. In this study, we address a novel task of matching such non-identical objects. We propose a simple weighting scheme of descriptors that enhances various sparse image matching methods, which were originally designed for matching identical objects captured from different perspectives, and achieve semantically robust matching. The experiments show successful matching between non-identical objects in various cases including domain shift. Further, we present a first evaluation of the robustness of the image matching methods under common corruptions, which is a sort of domain shift, and the proposed method improves the matching in this case as well.

* 10+7 pages, 10 figures, 4 tables

Via

Access Paper or Ask Questions

Identifying Important Group of Pixels using Interactions

Jan 08, 2024

Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera

Figure 1 for Identifying Important Group of Pixels using Interactions

Figure 2 for Identifying Important Group of Pixels using Interactions

Figure 3 for Identifying Important Group of Pixels using Interactions

Figure 4 for Identifying Important Group of Pixels using Interactions

Abstract:To better understand the behavior of image classifiers, it is useful to visualize the contribution of individual pixels to the model prediction. In this study, we propose a method, MoXI~($\textbf{Mo}$del e$\textbf{X}$planation by $\textbf{I}$nteractions), that efficiently and accurately identifies a group of pixels with high prediction confidence. The proposed method employs game-theoretic concepts, Shapley values and interactions, taking into account the effects of individual pixels and the cooperative influence of pixels on model confidence. Theoretical analysis and experiments demonstrate that our method better identifies the pixels that are highly contributing to the model outputs than widely-used visualization methods using Grad-CAM, Attention rollout, and Shapley value. While prior studies have suffered from the exponential computational cost in the computation of Shapley value and interactions, we show that this can be reduced to linear cost for our task.

* 16 pages, 15 figures

Via

Access Paper or Ask Questions

Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

May 29, 2023

Nariki Tanaka, Hiroshi Kera, Kazuhiko Kawamoto

Figure 1 for Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

Figure 2 for Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

Figure 3 for Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

Figure 4 for Fourier Analysis on Robustness of Graph Convolutional Neural Networks for Skeleton-based Action Recognition

Abstract:Using Fourier analysis, we explore the robustness and vulnerability of graph convolutional neural networks (GCNs) for skeleton-based action recognition. We adopt a joint Fourier transform (JFT), a combination of the graph Fourier transform (GFT) and the discrete Fourier transform (DFT), to examine the robustness of adversarially-trained GCNs against adversarial attacks and common corruptions. Experimental results with the NTU RGB+D dataset reveal that adversarial training does not introduce a robustness trade-off between adversarial attacks and low-frequency perturbations, which typically occurs during image classification based on convolutional neural networks. This finding indicates that adversarial training is a practical approach to enhancing robustness against adversarial attacks and common corruptions in skeleton-based action recognition. Furthermore, we find that the Fourier approach cannot explain vulnerability against skeletal part occlusion corruption, which highlights its limitations. These findings extend our understanding of the robustness of GCNs, potentially guiding the development of more robust learning methods for skeleton-based action recognition.

* 17 pages, 13 figures

Via

Access Paper or Ask Questions