Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tatsuya Ishigaki

Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Apr 12, 2024

Kosuke Takahashi, Takahiro Omi, Kosuke Arima, Tatsuya Ishigaki

Figure 1 for Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Figure 2 for Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Figure 3 for Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Figure 4 for Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain

Abstract:Several previous studies have considered language- and domain-specific large language models (LLMs) as separate topics. This study explores the combination of a non-English language and a high-demand industry domain, focusing on a Japanese business-specific LLM. This type of a model requires expertise in the business domain, strong language skills, and regular updates of its knowledge. We trained a 13-billion-parameter LLM from scratch using a new dataset of business texts and patents, and continually pretrained it with the latest business documents. Further we propose a new benchmark for Japanese business domain question answering (QA) and evaluate our models on it. The results show that our pretrained model improves QA accuracy without losing general knowledge, and that continual pretraining enhances adaptation to new information. Our pretrained model and business domain benchmark are publicly available.

* 9 pages. preprint of COLM2024

Via

Access Paper or Ask Questions

Prompting for Numerical Sequences: A Case Study on Market Comment Generation

Apr 03, 2024

Masayuki Kawarada, Tatsuya Ishigaki, Hiroya Takamura

Figure 1 for Prompting for Numerical Sequences: A Case Study on Market Comment Generation

Figure 2 for Prompting for Numerical Sequences: A Case Study on Market Comment Generation

Figure 3 for Prompting for Numerical Sequences: A Case Study on Market Comment Generation

Figure 4 for Prompting for Numerical Sequences: A Case Study on Market Comment Generation

Abstract:Large language models (LLMs) have been applied to a wide range of data-to-text generation tasks, including tables, graphs, and time-series numerical data-to-text settings. While research on generating prompts for structured data such as tables and graphs is gaining momentum, in-depth investigations into prompting for time-series numerical data are lacking. Therefore, this study explores various input representations, including sequences of tokens and structured formats such as HTML, LaTeX, and Python-style codes. In our experiments, we focus on the task of Market Comment Generation, which involves taking a numerical sequence of stock prices as input and generating a corresponding market comment. Contrary to our expectations, the results show that prompts resembling programming languages yield better outcomes, whereas those similar to natural languages and longer formats, such as HTML and LaTeX, are less effective. Our findings offer insights into creating effective prompts for tasks that generate text from numerical sequences.

* Accepted to LREC-COLING2024 long paper

Via

Access Paper or Ask Questions

Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Oct 13, 2023

Kosuke Takahashi, Takahiro Omi, Kosuke Arima, Tatsuya Ishigaki

Figure 1 for Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Figure 2 for Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Figure 3 for Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Figure 4 for Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Abstract:This paper presents a simple and cost-effective method for synthesizing data to train question-answering systems. For training, fine-tuning GPT models is a common practice in resource-rich languages like English, however, it becomes challenging for non-English languages due to the scarcity of sufficient question-answer (QA) pairs. Existing approaches use question and answer generators trained on human-authored QA pairs, which involves substantial human expenses. In contrast, we use an instruct-tuned model to generate QA pairs in a zero-shot or few-shot manner. We conduct experiments to compare various strategies for obtaining QA pairs from the instruct-tuned model. The results demonstrate that a model trained on our proposed synthetic data achieves comparable performance to a model trained on manually curated datasets, without incurring human costs.

* PACLIC 2023 short paper, 4 pages (6 pages including references), 4 figures

Via

Access Paper or Ask Questions

Learning to Select, Track, and Generate for Data-to-Text

Jul 23, 2019

Hayate Iso, Yui Uehara, Tatsuya Ishigaki, Hiroshi Noji, Eiji Aramaki, Ichiro Kobayashi, Yusuke Miyao, Naoaki Okazaki, Hiroya Takamura

Figure 1 for Learning to Select, Track, and Generate for Data-to-Text

Figure 2 for Learning to Select, Track, and Generate for Data-to-Text

Figure 3 for Learning to Select, Track, and Generate for Data-to-Text

Figure 4 for Learning to Select, Track, and Generate for Data-to-Text

Abstract:We propose a data-to-text generation model with two modules, one for tracking and the other for text generation. Our tracking module selects and keeps track of salient information and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of tracking module. Our model is considered to simulate the human-like writing process that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generation. Experimental results show that our model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.

* ACL 2019

Via

Access Paper or Ask Questions