Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Takahiro Omi

Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Apr 12, 2024

Kosuke Takahashi, Takahiro Omi, Kosuke Arima, Tatsuya Ishigaki

Figure 1 for Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Figure 2 for Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Figure 3 for Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Figure 4 for Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Abstract:Several previous studies have considered language- and domain-specific large language models (LLMs) as separate topics. This study explores the combination of a non-English language and a high-demand industry domain, focusing on a Japanese business-specific LLM. This type of a model requires expertise in the business domain, strong language skills, and regular updates of its knowledge. We trained a 13-billion-parameter LLM from scratch using a new dataset of business texts and patents, and continually pretrained it with the latest business documents. Further we propose a new benchmark for Japanese business domain question answering (QA) and evaluate our models on it. The results show that our pretrained model improves QA accuracy without losing general knowledge, and that continual pretraining enhances adaptation to new information. Our pretrained model and business domain benchmark are publicly available.

* 9 pages. preprint of COLM2024

Via

Access Paper or Ask Questions

Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Oct 13, 2023

Kosuke Takahashi, Takahiro Omi, Kosuke Arima, Tatsuya Ishigaki

Figure 1 for Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Figure 2 for Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Figure 3 for Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Figure 4 for Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Abstract:This paper presents a simple and cost-effective method for synthesizing data to train question-answering systems. For training, fine-tuning GPT models is a common practice in resource-rich languages like English, however, it becomes challenging for non-English languages due to the scarcity of sufficient question-answer (QA) pairs. Existing approaches use question and answer generators trained on human-authored QA pairs, which involves substantial human expenses. In contrast, we use an instruct-tuned model to generate QA pairs in a zero-shot or few-shot manner. We conduct experiments to compare various strategies for obtaining QA pairs from the instruct-tuned model. The results demonstrate that a model trained on our proposed synthetic data achieves comparable performance to a model trained on manually curated datasets, without incurring human costs.

* PACLIC 2023 short paper, 4 pages (6 pages including references), 4 figures

Via

Access Paper or Ask Questions

Fully Neural Network based Model for General Temporal Point Processes

May 23, 2019

Takahiro Omi, Naonori Ueda, Kazuyuki Aihara

Figure 1 for Fully Neural Network based Model for General Temporal Point Processes

Figure 2 for Fully Neural Network based Model for General Temporal Point Processes

Figure 3 for Fully Neural Network based Model for General Temporal Point Processes

Figure 4 for Fully Neural Network based Model for General Temporal Point Processes

Abstract:A temporal point process is a mathematical model for a time series of discrete events, which covers various applications. Recently, recurrent neural network (RNN) based models have been developed for point processes and have been found effective. RNN based models usually assume a specific functional form for the time course of the intensity function of a point process (e.g., exponentially decreasing or increasing with the time since the most recent event). However, such an assumption can restrict the expressive power of the model. We herein propose a novel RNN based model in which the time course of the intensity function is represented in a general manner. In our approach, we first model the integral of the intensity function using a feedforward neural network and then obtain the intensity function as its derivative. This approach enables us to both obtain a flexible model of the intensity function and exactly evaluate the log-likelihood function, which contains the integral of the intensity function, without any numerical approximations. Our model achieves competitive or superior performances compared to the previous state-of-the-art methods for both synthetic and real datasets.

Via

Access Paper or Ask Questions