Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Miki Haseyama

StarMAP: Global Neighbor Embedding for Faithful Data Visualization

Feb 06, 2025

Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Abstract:Neighbor embedding is widely employed to visualize high-dimensional data; however, it frequently overlooks the global structure, e.g., intercluster similarities, thereby impeding accurate visualization. To address this problem, this paper presents Star-attracted Manifold Approximation and Projection (StarMAP), which incorporates the advantage of principal component analysis (PCA) in neighbor embedding. Inspired by the property of PCA embedding, which can be viewed as the largest shadow of the data, StarMAP introduces the concept of \textit{star attraction} by leveraging the PCA embedding. This approach yields faithful global structure preservation while maintaining the interpretability and computational efficiency of neighbor embedding. StarMAP was compared with existing methods in the visualization tasks of toy datasets, single-cell RNA sequencing data, and deep representation. The experimental results show that StarMAP is simple but effective in realizing faithful visualizations.

Via

Access Paper or Ask Questions

Continual Self-supervised Learning Considering Medical Domain Knowledge in Chest CT Images

Jan 08, 2025

Ren Tasai, Guang Li, Ren Togo, Minghui Tang, Takaaki Yoshimura, Hiroyuki Sugimori, Kenji Hirata, Takahiro Ogawa, Kohsuke Kudo, Miki Haseyama

Figure 1 for Continual Self-supervised Learning Considering Medical Domain Knowledge in Chest CT Images

Figure 2 for Continual Self-supervised Learning Considering Medical Domain Knowledge in Chest CT Images

Figure 3 for Continual Self-supervised Learning Considering Medical Domain Knowledge in Chest CT Images

Figure 4 for Continual Self-supervised Learning Considering Medical Domain Knowledge in Chest CT Images

Abstract:We propose a novel continual self-supervised learning method (CSSL) considering medical domain knowledge in chest CT images. Our approach addresses the challenge of sequential learning by effectively capturing the relationship between previously learned knowledge and new information at different stages. By incorporating an enhanced DER into CSSL and maintaining both diversity and representativeness within the rehearsal buffer of DER, the risk of data interference during pretraining is reduced, enabling the model to learn more richer and robust feature representations. In addition, we incorporate a mixup strategy and feature distillation to further enhance the model's ability to learn meaningful representations. We validate our method using chest CT images obtained under two different imaging conditions, demonstrating superior performance compared to state-of-the-art methods.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

Generative Dataset Distillation Based on Self-knowledge Distillation

Jan 08, 2025

Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Figure 1 for Generative Dataset Distillation Based on Self-knowledge Distillation

Figure 2 for Generative Dataset Distillation Based on Self-knowledge Distillation

Figure 3 for Generative Dataset Distillation Based on Self-knowledge Distillation

Figure 4 for Generative Dataset Distillation Based on Self-knowledge Distillation

Abstract:Dataset distillation is an effective technique for reducing the cost and complexity of model training while maintaining performance by compressing large datasets into smaller, more efficient versions. In this paper, we present a novel generative dataset distillation method that can improve the accuracy of aligning prediction logits. Our approach integrates self-knowledge distillation to achieve more precise distribution matching between the synthetic and original data, thereby capturing the overall structure and relationships within the data. To further improve the accuracy of alignment, we introduce a standardization step on the logits before performing distribution matching, ensuring consistency in the range of logits. Through extensive experiments, we demonstrate that our method outperforms existing state-of-the-art methods, resulting in superior distillation performance.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

LLM is Knowledge Graph Reasoner: LLM's Intuition-aware Knowledge Graph Reasoning for Cold-start Sequential Recommendation

Dec 17, 2024

Keigo Sakurai, Ren Togo, Takahiro Ogawa, Miki Haseyama

Figure 1 for LLM is Knowledge Graph Reasoner: LLM's Intuition-aware Knowledge Graph Reasoning for Cold-start Sequential Recommendation

Figure 2 for LLM is Knowledge Graph Reasoner: LLM's Intuition-aware Knowledge Graph Reasoning for Cold-start Sequential Recommendation

Figure 3 for LLM is Knowledge Graph Reasoner: LLM's Intuition-aware Knowledge Graph Reasoning for Cold-start Sequential Recommendation

Figure 4 for LLM is Knowledge Graph Reasoner: LLM's Intuition-aware Knowledge Graph Reasoning for Cold-start Sequential Recommendation

Abstract:Knowledge Graphs (KGs) represent relationships between entities in a graph structure and have been widely studied as promising tools for realizing recommendations that consider the accurate content information of items. However, traditional KG-based recommendation methods face fundamental challenges: insufficient consideration of temporal information and poor performance in cold-start scenarios. On the other hand, Large Language Models (LLMs) can be considered databases with a wealth of knowledge learned from the web data, and they have recently gained attention due to their potential application as recommendation systems. Although approaches that treat LLMs as recommendation systems can leverage LLMs' high recommendation literacy, their input token limitations make it impractical to consider the entire recommendation domain dataset and result in scalability issues. To address these challenges, we propose a LLM's Intuition-aware Knowledge graph Reasoning model (LIKR). Our main idea is to treat LLMs as reasoners that output intuitive exploration strategies for KGs. To integrate the knowledge of LLMs and KGs, we trained a recommendation agent through reinforcement learning using a reward function that integrates different recommendation strategies, including LLM's intuition and KG embeddings. By incorporating temporal awareness through prompt engineering and generating textual representations of user preferences from limited interactions, LIKR can improve recommendation performance in cold-start scenarios. Furthermore, LIKR can avoid scalability issues by using KGs to represent recommendation domain datasets and limiting the LLM's output to KG exploration strategies. Experiments on real-world datasets demonstrate that our model outperforms state-of-the-art recommendation methods in cold-start sequential recommendation scenarios.

* Accepted to the 47th European Conference on Information Retrieval (ECIR2025)

Via

Access Paper or Ask Questions

Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence

Dec 10, 2024

Wenbo Huang, Jinghui Zhang, Guang Li, Lei Zhang, Shuoyuan Wang, Fang Dong, Jiahui Jin, Takahiro Ogawa, Miki Haseyama

Abstract:In few-shot action recognition~(FSAR), long sub-sequences of video naturally express entire actions more effectively. However, the computational complexity of mainstream Transformer-based methods limits their application. Recent Mamba demonstrates efficiency in modeling long sequences, but directly applying Mamba to FSAR overlooks the importance of local feature modeling and alignment. Moreover, long sub-sequences within the same class accumulate intra-class variance, which adversely impacts FSAR performance. To solve these challenges, we propose a \underline{\textbf{M}}atryoshka M\underline{\textbf{A}}mba and Co\underline{\textbf{N}}tras\underline{\textbf{T}}ive Le\underline{\textbf{A}}rning framework~(\textbf{Manta}). Firstly, the Matryoshka Mamba introduces multiple Inner Modules to enhance local feature representation, rather than directly modeling global features. An Outer Module captures dependencies of timeline between these local features for implicit temporal alignment. Secondly, a hybrid contrastive learning paradigm, combining both supervised and unsupervised methods, is designed to mitigate the negative effects of intra-class variance accumulation. The Matryoshka Mamba and the hybrid contrastive learning paradigm operate in parallel branches within Manta, enhancing Mamba for FSAR of long sub-sequence. Manta achieves new state-of-the-art performance on prominent benchmarks, including SSv2, Kinetics, UCF101, and HMDB51. Extensive empirical studies prove that Manta significantly improves FSAR of long sub-sequence from multiple perspectives. The code is released at https://github.com/wenbohuang1002/Manta.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering

Oct 23, 2024

He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract:Conventional medical artificial intelligence (AI) models face barriers in clinical application and ethical issues owing to their inability to handle the privacy-sensitive characteristics of medical data. We present a novel personalized federated learning (pFL) method for medical visual question answering (VQA) models, addressing privacy reliability challenges in the medical domain. Our method introduces learnable prompts into a Transformer architecture to efficiently train it on diverse medical datasets without massive computational costs. Then we introduce a reliable client VQA model that incorporates Dempster-Shafer evidence theory to quantify uncertainty in predictions, enhancing the model's reliability. Furthermore, we propose a novel inter-client communication mechanism that uses maximum likelihood estimation to balance accuracy and uncertainty, fostering efficient integration of insights across clients.

Via

Access Paper or Ask Questions

Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation

Oct 22, 2024

Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Abstract:Dimensionality reduction (DR) offers a useful representation of complex high-dimensional data. Recent DR methods focus on hyperbolic geometry to derive a faithful low-dimensional representation of hierarchical data. However, existing methods are based on neighbor embedding, frequently ruining the continual relation of the hierarchies. This paper presents hyperboloid Gaussian process (GP) latent variable models (hGP-LVMs) to embed high-dimensional hierarchical data with implicit continuity via nonparametric estimation. We adopt generative modeling using the GP, which brings effective hierarchical embedding and executes ill-posed hyperparameter tuning. This paper presents three variants that employ original point, sparse point, and Bayesian estimations. We establish their learning algorithms by incorporating the Riemannian optimization and active approximation scheme of GP-LVM. For Bayesian inference, we further introduce the reparameterization trick to realize Bayesian latent variable learning. In the last part of this paper, we apply hGP-LVMs to several datasets and show their ability to represent high-dimensional hierarchies in low-dimensional spaces.

Via

Access Paper or Ask Questions

Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

Sep 03, 2024

Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Figure 1 for Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

Figure 2 for Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

Figure 3 for Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

Figure 4 for Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

Abstract:We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR). Fine-grained TSR in the wild is difficult due to the complex road conditions, and existing approaches particularly struggle with cross-country TSR when data is lacking. Our strategy achieves effective fine-grained TSR by stimulating the multiple-thinking capability of large multimodal models (LMM). We introduce context, characteristic, and differential descriptions to design multiple thinking processes for the LMM. The context descriptions with center coordinate prompt optimization help the LMM to locate the target traffic sign in the original road images containing multiple traffic signs and filter irrelevant answers through the proposed prior traffic sign hypothesis. The characteristic description is based on few-shot in-context learning of template traffic signs, which decreases the cross-domain difference and enhances the fine-grained recognition capability of the LMM. The differential descriptions of similar traffic signs optimize the multimodal thinking capability of the LMM. The proposed method is independent of training data and requires only simple and uniform instructions. We conducted extensive experiments on three benchmark datasets and two real-world datasets from different countries, and the proposed method achieves state-of-the-art TSR results on all five datasets.

Via

Access Paper or Ask Questions

MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT

Sep 02, 2024

Jinlong Zhu, Keigo Sakurai, Ren Togo, Takahiro Ogawa, Miki Haseyama

Abstract:We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation. The main theme of symbolic music generation primarily encompasses the preprocessing of music data and the implementation of a deep learning framework. Current techniques dedicated to symbolic music generation generally encounter two significant challenges: training data's lack of information about chords and scales and the requirement of specially designed model architecture adapted to the unique format of symbolic music representation. In this paper, we solve the above problems by introducing new symbolic music representation with MusicLang chord analysis model. We propose our MMT-BERT architecture adapting to the representation. To build a robust multitrack music generator, we fine-tune a pre-trained MusicBERT model to serve as the discriminator, and incorporate relativistic standard loss. This approach, supported by the in-depth understanding of symbolic music encoded within MusicBERT, fortifies the consonance and humanity of music generated by our method. Experimental results demonstrate the effectiveness of our approach which strictly follows the state-of-the-art methods.

* Accepted to the 25th International Society for Music Information Retrieval Conference (ISMIR 2024)

Via

Access Paper or Ask Questions

Generative Dataset Distillation Based on Diffusion Model

Aug 16, 2024

Duo Su, Junjie Hou, Guang Li, Ren Togo, Rui Song, Takahiro Ogawa, Miki Haseyama

Figure 1 for Generative Dataset Distillation Based on Diffusion Model

Figure 2 for Generative Dataset Distillation Based on Diffusion Model

Abstract:This paper presents our method for the generative track of The First Dataset Distillation Challenge at ECCV 2024. Since the diffusion model has become the mainstay of generative models because of its high-quality generative effects, we focus on distillation methods based on the diffusion model. Considering that the track can only generate a fixed number of images in 10 minutes using a generative model for CIFAR-100 and Tiny-ImageNet datasets, we need to use a generative model that can generate images at high speed. In this study, we proposed a novel generative dataset distillation method based on Stable Diffusion. Specifically, we use the SDXL-Turbo model which can generate images at high speed and quality. Compared to other diffusion models that can only generate images per class (IPC) = 1, our method can achieve an IPC = 10 for Tiny-ImageNet and an IPC = 20 for CIFAR-100, respectively. Additionally, to generate high-quality distilled datasets for CIFAR-100 and Tiny-ImageNet, we use the class information as text prompts and post data augmentation for the SDXL-Turbo model. Experimental results show the effectiveness of the proposed method, and we achieved third place in the generative track of the ECCV 2024 DD Challenge. Codes are available at https://github.com/Guang000/BANKO.

* The Third Place Winner in Generative Track of the ECCV 2024 DD Challenge

Via

Access Paper or Ask Questions