Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kosuke Nishida

Zero-shot Concept Bottleneck Models

Feb 13, 2025

Shin'ya Yamaguchi, Kosuke Nishida, Daiki Chijiwa, Yasutoshi Ida

Abstract:Concept bottleneck models (CBMs) are inherently interpretable and intervenable neural network models, which explain their final label prediction by the intermediate prediction of high-level semantic concepts. However, they require target task training to learn input-to-concept and concept-to-label mappings, incurring target dataset collections and training resources. In this paper, we present \textit{zero-shot concept bottleneck models} (Z-CBMs), which predict concepts and labels in a fully zero-shot manner without training neural networks. Z-CBMs utilize a large-scale concept bank, which is composed of millions of vocabulary extracted from the web, to describe arbitrary input in various domains. For the input-to-concept mapping, we introduce concept retrieval, which dynamically finds input-related concepts by the cross-modal search on the concept bank. In the concept-to-label inference, we apply concept regression to select essential concepts from the retrieved concepts by sparse linear regression. Through extensive experiments, we confirm that our Z-CBMs provide interpretable and intervenable concepts without any additional training. Code will be available at https://github.com/yshinya6/zcbm.

* 14 pages, 8 figures

Via

Access Paper or Ask Questions

Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes

Oct 07, 2024

Kosuke Nishida, Kyosuke Nishida, Kuniko Saito

Abstract:Loss spikes, a phenomenon in which the loss value diverges suddenly, is a fundamental issue in the pre-training of large language models. This paper supposes that the non-uniformity of the norm of the parameters is one of the causes of loss spikes. Here, in training of neural networks, the scale of the gradients is required to be kept constant throughout the layers to avoid the vanishing and exploding gradients problem. However, to meet these requirements in the Transformer model, the norm of the model parameters must be non-uniform, and thus, parameters whose norm is smaller are more sensitive to the parameter update. To address this issue, we propose a novel technique, weight scaling as reparameterization (WeSaR). WeSaR introduces a gate parameter per parameter matrix and adjusts it to the value satisfying the requirements. Because of the gate parameter, WeSaR sets the norm of the original parameters uniformly, which results in stable training. Experimental results with the Transformer decoders consisting of 130 million, 1.3 billion, and 13 billion parameters showed that WeSaR stabilizes and accelerates training and that it outperformed compared methods including popular initialization methods.

* EMNLP2024 accepted

Via

Access Paper or Ask Questions

Explanation Bottleneck Models

Sep 26, 2024

Shin'ya Yamaguchi, Kosuke Nishida

Abstract:Recent concept-based interpretable models have succeeded in providing meaningful explanations by pre-defined concept sets. However, the dependency on the pre-defined concepts restricts the application because of the limited number of concepts for explanations. This paper proposes a novel interpretable deep neural network called explanation bottleneck models (XBMs). XBMs generate a text explanation from the input without pre-defined concepts and then predict a final task prediction based on the generated explanation by leveraging pre-trained vision-language encoder-decoder models. To achieve both the target task performance and the explanation quality, we train XBMs through the target task loss with the regularization penalizing the explanation decoder via the distillation from the frozen pre-trained decoder. Our experiments, including a comparison to state-of-the-art concept bottleneck models, confirm that XBMs provide accurate and fluent natural language explanations without pre-defined concept sets. Code will be available at https://github.com/yshinya6/xbm/.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions

Robust Text-driven Image Editing Method that Adaptively Explores Directions in Latent Spaces of StyleGAN and CLIP

Apr 03, 2023

Tsuyoshi Baba, Kosuke Nishida, Kyosuke Nishida

Abstract:Automatic image editing has great demands because of its numerous applications, and the use of natural language instructions is essential to achieving flexible and intuitive editing as the user imagines. A pioneering work in text-driven image editing, StyleCLIP, finds an edit direction in the CLIP space and then edits the image by mapping the direction to the StyleGAN space. At the same time, it is difficult to tune appropriate inputs other than the original image and text instructions for image editing. In this study, we propose a method to construct the edit direction adaptively in the StyleGAN and CLIP spaces with SVM. Our model represents the edit direction as a normal vector in the CLIP space obtained by training a SVM to classify positive and negative images. The images are retrieved from a large-scale image corpus, originally used for pre-training StyleGAN, according to the CLIP similarity between the images and the text instruction. We confirmed that our model performed as well as the StyleCLIP baseline, whereas it allows simple inputs without increasing the computational time.

Via

Access Paper or Ask Questions

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

Jan 12, 2023

Ryota Tanaka, Kyosuke Nishida, Kosuke Nishida, Taku Hasegawa, Itsumi Saito, Kuniko Saito

Abstract:Visual question answering on document images that contain textual, visual, and layout information, called document VQA, has received much attention recently. Although many datasets have been proposed for developing document VQA systems, most of the existing datasets focus on understanding the content relationships within a single image and not across multiple images. In this study, we propose a new multi-image document VQA dataset, SlideVQA, containing 2.6k+ slide decks composed of 52k+ slide images and 14.5k questions about a slide deck. SlideVQA requires complex reasoning, including single-hop, multi-hop, and numerical reasoning, and also provides annotated arithmetic expressions of numerical answers for enhancing the ability of numerical reasoning. Moreover, we developed a new end-to-end document VQA model that treats evidence selection and question answering in a unified sequence-to-sequence format. Experiments on SlideVQA show that our model outperformed existing state-of-the-art QA models, but that it still has a large gap behind human performance. We believe that our dataset will facilitate research on document VQA.

* Accepted by AAAI2023

Via

Access Paper or Ask Questions

Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Oct 14, 2022

Kosuke Nishida, Naoki Yoshinaga, Kyosuke Nishida

Figure 1 for Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Figure 2 for Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Figure 3 for Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Figure 4 for Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Abstract:Although named entity recognition (NER) helps us to extract various domain-specific entities from text (e.g., artists in the music domain), it is costly to create a large amount of training data or a structured knowledge base to perform accurate NER in the target domain. Here, we propose self-adaptive NER, where the model retrieves the external knowledge from unstructured text to learn the usage of entities that has not been learned well. To retrieve useful knowledge for NER, we design an effective two-stage model that retrieves unstructured knowledge using uncertain entities as queries. Our model first predicts the entities in the input and then finds the entities of which the prediction is not confident. Then, our model retrieves knowledge by using these uncertain entities as queries and concatenates the retrieved text to the original input to revise the prediction. Experiments on CrossNER datasets demonstrated that our model outperforms the strong NERBERT baseline by 2.45 points on average.

Via

Access Paper or Ask Questions

Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

Jul 07, 2022

Kosuke Nishida, Kyosuke Nishida, Shuichi Nishioka

Figure 1 for Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

Figure 2 for Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

Figure 3 for Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

Figure 4 for Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

Abstract:Humans can obtain the knowledge of novel visual concepts from language descriptions, and we thus use the few-shot image classification task to investigate whether a machine learning model can have this capability. Our proposed model, LIDE (Learning from Image and DEscription), has a text decoder to generate the descriptions and a text encoder to obtain the text representations of machine- or user-generated descriptions. We confirmed that LIDE with machine-generated descriptions outperformed baseline models. Moreover, the performance was improved further with high-quality user-generated descriptions. The generated descriptions can be viewed as the explanations of the model's predictions, and we observed that such explanations were consistent with prediction results. We also investigated why the language description improved the few-shot image classification performance by comparing the image representations and the text representations in the feature spaces.

* Findings of NAACL2022

Via

Access Paper or Ask Questions

Towards Interpretable and Reliable Reading Comprehension: A Pipeline Model with Unanswerability Prediction

Nov 18, 2021

Kosuke Nishida, Kyosuke Nishida, Itsumi Saito, Sen Yoshida

Figure 1 for Towards Interpretable and Reliable Reading Comprehension: A Pipeline Model with Unanswerability Prediction

Figure 2 for Towards Interpretable and Reliable Reading Comprehension: A Pipeline Model with Unanswerability Prediction

Figure 3 for Towards Interpretable and Reliable Reading Comprehension: A Pipeline Model with Unanswerability Prediction

Figure 4 for Towards Interpretable and Reliable Reading Comprehension: A Pipeline Model with Unanswerability Prediction

Abstract:Multi-hop QA with annotated supporting facts, which is the task of reading comprehension (RC) considering the interpretability of the answer, has been extensively studied. In this study, we define an interpretable reading comprehension (IRC) model as a pipeline model with the capability of predicting unanswerable queries. The IRC model justifies the answer prediction by establishing consistency between the predicted supporting facts and the actual rationale for interpretability. The IRC model detects unanswerable questions, instead of outputting the answer forcibly based on the insufficient information, to ensure the reliability of the answer. We also propose an end-to-end training method for the pipeline RC model. To evaluate the interpretability and the reliability, we conducted the experiments considering unanswerability in a multi-hop question for a given passage. We show that our end-to-end trainable pipeline model outperformed a non-interpretable model on our modified HotpotQA dataset. Experimental results also show that the IRC model achieves comparable results to the previous non-interpretable models in spite of the trade-off between prediction performance and interpretability.

* International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1-8
* IJCNN 2021 (https://ieeexplore.ieee.org/abstract/document/9534370)

Via

Access Paper or Ask Questions

Task-adaptive Pre-training of Language Models with Word Embedding Regularization

Sep 17, 2021

Kosuke Nishida, Kyosuke Nishida, Sen Yoshida

Figure 1 for Task-adaptive Pre-training of Language Models with Word Embedding Regularization

Figure 2 for Task-adaptive Pre-training of Language Models with Word Embedding Regularization

Figure 3 for Task-adaptive Pre-training of Language Models with Word Embedding Regularization

Figure 4 for Task-adaptive Pre-training of Language Models with Word Embedding Regularization

Abstract:Pre-trained language models (PTLMs) acquire domain-independent linguistic knowledge through pre-training with massive textual resources. Additional pre-training is effective in adapting PTLMs to domains that are not well covered by the pre-training corpora. Here, we focus on the static word embeddings of PTLMs for domain adaptation to teach PTLMs domain-specific meanings of words. We propose a novel fine-tuning process: task-adaptive pre-training with word embedding regularization (TAPTER). TAPTER runs additional pre-training by making the static word embeddings of a PTLM close to the word embeddings obtained in the target domain with fastText. TAPTER requires no additional corpus except for the training data of the downstream task. We confirmed that TAPTER improves the performance of the standard fine-tuning and the task-adaptive pre-training on BioASQ (question answering in the biomedical domain) and on SQuAD (the Wikipedia domain) when their pre-training corpora were not dominated by in-domain data.

* ACL Findings 2021

Via

Access Paper or Ask Questions

Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Mar 29, 2020

Itsumi Saito, Kyosuke Nishida, Kosuke Nishida, Junji Tomita

Figure 1 for Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Figure 2 for Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Figure 3 for Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Figure 4 for Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Abstract:Pre-trained sequence-to-sequence (seq-to-seq) models have significantly improved the accuracy of several language generation tasks, including abstractive summarization. Although the fluency of abstractive summarization has been greatly improved by fine-tuning these models, it is not clear whether they can also identify the important parts of the source text to be included in the summary. In this study, we investigated the effectiveness of combining saliency models that identify the important parts of the source text with the pre-trained seq-to-seq models through extensive experiments. We also proposed a new combination model consisting of a saliency model that extracts a token sequence from a source text and a seq-to-seq model that takes the sequence as an additional input text. Experimental results showed that most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets even if the seq-to-seq model is pre-trained on large-scale corpora. Moreover, for the CNN/DM dataset, the proposed combination model exceeded the previous best-performed model by 1.33 points on ROUGE-L.

* Work in progress

Via

Access Paper or Ask Questions