Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mamoru Komachi

Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

Jan 16, 2025

Hajime Kiyama, Taichi Aida, Mamoru Komachi, Toshinobu Ogiso, Hiroya Takamura, Daichi Mochihashi

Figure 1 for Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

Figure 2 for Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

Figure 3 for Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

Figure 4 for Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

Abstract:The meanings and relationships of words shift over time. This phenomenon is referred to as semantic shift.Research focused on understanding how semantic shifts occur over multiple time periods is essential for gaining a detailed understanding of semantic shifts.However, detecting change points only between adjacent time periods is insufficient for analyzing detailed semantic shifts, and using BERT-based methods to examine word sense proportions incurs a high computational cost.To address those issues, we propose a simple yet intuitive framework for how semantic shifts occur over multiple time periods by leveraging a similarity matrix between the embeddings of the same word through time.We compute a diachronic word similarity matrix using fast and lightweight word embeddings across arbitrary time periods, making it deeper to analyze continuous semantic shifts.Additionally, by clustering the similarity matrices for different words, we can categorize words that exhibit similar behavior of semantic shift in an unsupervised manner.

* COLING2025

Via

Access Paper or Ask Questions

Pruning Multilingual Large Language Models for Multilingual Inference

Sep 25, 2024

Hwichan Kim, Jun Suzuki, Tosho Hirasawa, Mamoru Komachi

Abstract:Multilingual large language models (MLLMs), trained on multilingual balanced data, demonstrate better zero-shot learning performance in non-English languages compared to large language models trained on English-dominant data. However, the disparity in performance between English and non-English languages remains a challenge yet to be fully addressed. A distinctive characteristic of MLLMs is their high-quality translation capabilities, indicating an acquired proficiency in aligning between languages. This study explores how to enhance the zero-shot performance of MLLMs in non-English languages by leveraging their alignment capability between English and non-English languages. To achieve this, we first analyze the behavior of MLLMs when performing translation and reveal that there are large magnitude features that play a critical role in the translation process. Inspired by these findings, we retain the weights associated with operations involving the large magnitude features and prune other weights to force MLLMs to rely on these features for tasks beyond translation. We empirically demonstrate that this pruning strategy can enhance the MLLMs' performance in non-English language.

* Accepted at EMNLP 2024 Findings

Via

Access Paper or Ask Questions

Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction

Mar 26, 2024

Masamune Kobayashi, Masato Mita, Mamoru Komachi

Abstract:Large Language Models (LLMs) have been reported to outperform existing automatic evaluation metrics in some tasks, such as text summarization and machine translation. However, there has been a lack of research on LLMs as evaluators in grammatical error correction (GEC). In this study, we investigate the performance of LLMs in GEC evaluation by employing prompts designed to incorporate various evaluation criteria inspired by previous research. Our extensive experimental results demonstrate that GPT-4 achieved Kendall's rank correlation of 0.662 with human judgments, surpassing all existing methods. Furthermore, in recent GEC evaluations, we have underscored the significance of the LLMs scale and particularly emphasized the importance of fluency among evaluation criteria.

Via

Access Paper or Ask Questions

Revisiting Meta-evaluation for Grammatical Error Correction

Mar 05, 2024

Masamune Kobayashi, Masato Mita, Mamoru Komachi

Abstract:Metrics are the foundation for automatic evaluation in grammatical error correction (GEC), with their evaluation of the metrics (meta-evaluation) relying on their correlation with human judgments. However, conventional meta-evaluations in English GEC encounter several challenges including biases caused by inconsistencies in evaluation granularity, and an outdated setup using classical systems. These problems can lead to misinterpretation of metrics and potentially hinder the applicability of GEC techniques. To address these issues, this paper proposes SEEDA, a new dataset for GEC meta-evaluation. SEEDA consists of corrections with human ratings along two different granularities: edit-based and sentence-based, covering 12 state-of-the-art systems including large language models (LLMs), and two human corrections with different focuses. The results of improved correlations by aligning the granularity in the sentence-level meta-evaluation, suggest that edit-based metrics may have been underestimated in existing studies. Furthermore, correlations of most metrics decrease when changing from classical to neural systems, indicating that traditional metrics are relatively poor at evaluating fluently corrected sentences with many edits.

* Accepted to TACL. This arXiv version is a pre-MIT Press publication version

Via

Access Paper or Ask Questions

WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia

May 10, 2023

Kenichiro Ando, Satoshi Sekine, Mamoru Komachi

Abstract:Wikipedia can be edited by anyone and thus contains various quality sentences. Therefore, Wikipedia includes some poor-quality edits, which are often marked up by other editors. While editors' reviews enhance the credibility of Wikipedia, it is hard to check all edited text. Assisting in this process is very important, but a large and comprehensive dataset for studying it does not currently exist. Here, we propose WikiSQE, the first large-scale dataset for sentence quality estimation in Wikipedia. Each sentence is extracted from the entire revision history of Wikipedia, and the target quality labels were carefully investigated and selected. WikiSQE has about 3.4 M sentences with 153 quality labels. In the experiment with automatic classification using competitive machine learning models, sentences that had problems with citation, syntax/semantics, or propositions were found to be more difficult to detect. In addition, we conducted automated essay scoring experiments to evaluate the generalizability of the dataset. We show that the models trained on WikiSQE perform better than the vanilla model, indicating its potential usefulness in other domains. WikiSQE is expected to be a valuable resource for other tasks in NLP.

* First draft

Via

Access Paper or Ask Questions

Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?

Mar 10, 2023

Kenichiro Ando, Mamoru Komachi, Takashi Okumura, Hiromasa Horiguchi, Yuji Matsumoto

Figure 1 for Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?

Figure 2 for Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?

Figure 3 for Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?

Figure 4 for Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?

Abstract:During the patient's hospitalization, the physician must record daily observations of the patient and summarize them into a brief document called "discharge summary" when the patient is discharged. Automated generation of discharge summary can greatly relieve the physicians' burden, and has been addressed recently in the research community. Most previous studies of discharge summary generation using the sequence-to-sequence architecture focus on only inpatient notes for input. However, electric health records (EHR) also have rich structured metadata (e.g., hospital, physician, disease, length of stay, etc.) that might be useful. This paper investigates the effectiveness of medical meta-information for summarization tasks. We obtain four types of meta-information from the EHR systems and encode each meta-information into a sequence-to-sequence model. Using Japanese EHRs, meta-information encoded models increased ROUGE-1 by up to 4.45 points and BERTScore by 3.77 points over the vanilla Longformer. Also, we found that the encoded meta-information improves the precisions of its related terms in the outputs. Our results showed the benefit of the use of medical meta-information.

* International Conference on Technologies and Applications of Artificial Intelligence (TAAI). 2022;143-148

Via

Access Paper or Ask Questions

Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

Sep 20, 2022

Kenichiro Ando, Takashi OkumuraID, Mamoru Komachi, Hiromasa Horiguchi, Yuji Matsumoto

Figure 1 for Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

Figure 2 for Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

Figure 3 for Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

Figure 4 for Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

Abstract:Automated summarization of clinical texts can reduce the burden of medical professionals. "Discharge summaries" are one promising application of the summarization, because they can be generated from daily inpatient records. Our preliminary experiment suggests that 20-31% of the descriptions in discharge summaries overlap with the content of the inpatient records. However, it remains unclear how the summaries should be generated from the unstructured source. To decompose the physician's summarization process, this study aimed to identify the optimal granularity in summarization. We first defined three types of summarization units with different granularities to compare the performance of the discharge summary generation: whole sentences, clinical segments, and clauses. We defined clinical segments in this study, aiming to express the smallest medically meaningful concepts. To obtain the clinical segments, it was necessary to automatically split the texts in the first stage of the pipeline. Accordingly, we compared rule-based methods and a machine learning method, and the latter outperformed the formers with an F1 score of 0.846 in the splitting task. Next, we experimentally measured the accuracy of extractive summarization using the three types of units, based on the ROUGE-1 metric, on a multi-institutional national archive of health records in Japan. The measured accuracies of extractive summarization using whole sentences, clinical segments, and clauses were 31.91, 36.15, and 25.18, respectively. We found that the clinical segments yielded higher accuracy than sentences and clauses. This result indicates that summarization of inpatient records demands finer granularity than sentence-oriented processing. Although we used only Japanese health records, it can be interpreted as follows: physicians extract "concepts of medical significance" from patient records and recombine them ...

* PLOS Digital Health 1(9): e0000099. (2022)

Via

Access Paper or Ask Questions

Learning How to Translate North Korean through South Korean

Jan 27, 2022

Hwichan Kim, Sangwhan Moon, Naoaki Okazaki, Mamoru Komachi

Figure 1 for Learning How to Translate North Korean through South Korean

Figure 2 for Learning How to Translate North Korean through South Korean

Figure 3 for Learning How to Translate North Korean through South Korean

Figure 4 for Learning How to Translate North Korean through South Korean

Abstract:South and North Korea both use the Korean language. However, Korean NLP research has focused on South Korean only, and existing NLP systems of the Korean language, such as neural machine translation (NMT) models, cannot properly handle North Korean inputs. Training a model using North Korean data is the most straightforward approach to solving this problem, but there is insufficient data to train NMT models. In this study, we create data for North Korean NMT models using a comparable corpus. First, we manually create evaluation data for automatic alignment and machine translation. Then, we investigate automatic alignment methods suitable for North Korean. Finally, we verify that a model trained by North Korean bilingual data without human annotation can significantly boost North Korean translation accuracy compared to existing South Korean models in zero-shot settings.

* 8 pages, 1 figures, 8 tables

Via

Access Paper or Ask Questions

Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

Jan 20, 2022

Daisuke Suzuki, Yujin Takahashi, Ikumi Yamashita, Taichi Aida, Tosho Hirasawa, Michitaka Nakatsuji, Masato Mita, Mamoru Komachi

Figure 1 for Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

Figure 2 for Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

Figure 3 for Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

Figure 4 for Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction

Abstract:In grammatical error correction (GEC), automatic evaluation is an important factor for research and development of GEC systems. Previous studies on automatic evaluation have demonstrated that quality estimation models built from datasets with manual evaluation can achieve high performance in automatic evaluation of English GEC without using reference sentences.. However, quality estimation models have not yet been studied in Japanese, because there are no datasets for constructing quality estimation models. Therefore, in this study, we created a quality estimation dataset with manual evaluation to build an automatic evaluation model for Japanese GEC. Moreover, we conducted a meta-evaluation to verify the dataset's usefulness in building the Japanese quality estimation model.

* 8 pages (6pages + references)

Via

Access Paper or Ask Questions

Proficiency Matters Quality Estimation in Grammatical Error Correction

Jan 17, 2022

Yujin Takahashi, Masahiro Kaneko, Masato Mita, Mamoru Komachi

Figure 1 for Proficiency Matters Quality Estimation in Grammatical Error Correction

Figure 2 for Proficiency Matters Quality Estimation in Grammatical Error Correction

Figure 3 for Proficiency Matters Quality Estimation in Grammatical Error Correction

Figure 4 for Proficiency Matters Quality Estimation in Grammatical Error Correction

Abstract:This study investigates how supervised quality estimation (QE) models of grammatical error correction (GEC) are affected by the learners' proficiency with the data. QE models for GEC evaluations in prior work have obtained a high correlation with manual evaluations. However, when functioning in a real-world context, the data used for the reported results have limitations because prior works were biased toward data by learners with relatively high proficiency levels. To address this issue, we created a QE dataset that includes multiple proficiency levels and explored the necessity of performing proficiency-wise evaluation for QE of GEC. Our experiments demonstrated that differences in evaluation dataset proficiency affect the performance of QE models, and proficiency-wise evaluation helps create more robust models.

* 6 pages (4 pages + references)

Via

Access Paper or Ask Questions