Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ichiro Sakata

UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation

Apr 29, 2025

Huimin Lu, Masaru Isonuma, Junichiro Mori, Ichiro Sakata

Abstract:We present UniDetox, a universally applicable method designed to mitigate toxicity across various large language models (LLMs). Previous detoxification methods are typically model-specific, addressing only individual models or model families, and require careful hyperparameter tuning due to the trade-off between detoxification efficacy and language modeling performance. In contrast, UniDetox provides a detoxification technique that can be universally applied to a wide range of LLMs without the need for separate model-specific tuning. Specifically, we propose a novel and efficient dataset distillation technique for detoxification using contrastive decoding. This approach distills detoxifying representations in the form of synthetic text data, enabling universal detoxification of any LLM through fine-tuning with the distilled text. Our experiments demonstrate that the detoxifying text distilled from GPT-2 can effectively detoxify larger models, including OPT, Falcon, and LLaMA-2. Furthermore, UniDetox eliminates the need for separate hyperparameter tuning for each model, as a single hyperparameter configuration can be seamlessly applied across different models. Additionally, analysis of the detoxifying text reveals a reduction in politically biased content, providing insights into the attributes necessary for effective detoxification of LLMs.

* Accepted at ICLR 2025 (poster)

Via

Access Paper or Ask Questions

Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation

Jul 24, 2024

Huimin Lu, Masaru Isonuma, Junichiro Mori, Ichiro Sakata

Figure 1 for Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation

Figure 2 for Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation

Figure 3 for Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation

Abstract:Large language models (LLMs) often inherit biases from vast amounts of training corpora. Traditional debiasing methods, while effective to some extent, do not completely eliminate memorized biases and toxicity in LLMs. In this paper, we study an unlearning-based approach to debiasing in LLMs by performing gradient ascent on hate speech against minority groups, i.e., minimizing the likelihood of biased or toxic content. Specifically, we propose a mask language modeling unlearning technique, which unlearns the harmful part of the text. This method enables LLMs to selectively forget and disassociate from biased and harmful content. Experimental results demonstrate the effectiveness of our approach in diminishing bias while maintaining the language modeling abilities. Surprisingly, the results also unveil an unexpected potential for cross-domain transfer unlearning: debiasing in one bias form (e.g. gender) may contribute to mitigating others (e.g. race and religion).

Via

Access Paper or Ask Questions

Differentiable Instruction Optimization for Cross-Task Generalization

Jun 16, 2023

Masaru Isonuma, Junichiro Mori, Ichiro Sakata

Abstract:Instruction tuning has been attracting much attention to achieve generalization ability across a wide variety of tasks. Although various types of instructions have been manually created for instruction tuning, it is still unclear what kind of instruction is optimal to obtain cross-task generalization ability. This work presents instruction optimization, which optimizes training instructions with respect to generalization ability. Rather than manually tuning instructions, we introduce learnable instructions and optimize them with gradient descent by leveraging bilevel optimization. Experimental results show that the learned instruction enhances the diversity of instructions and improves the generalization ability compared to using only manually created instructions.

* 14pages, 6 figures, accepted for Findings of ACL2023

Via

Access Paper or Ask Questions

SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation

May 24, 2023

Tetsu Kasanishi, Masaru Isonuma, Junichiro Mori, Ichiro Sakata

Abstract:Automatic literature review generation is one of the most challenging tasks in natural language processing. Although large language models have tackled literature review generation, the absence of large-scale datasets has been a stumbling block to the progress. We release SciReviewGen, consisting of over 10,000 literature reviews and 690,000 papers cited in the reviews. Based on the dataset, we evaluate recent transformer-based summarization models on the literature review generation task, including Fusion-in-Decoder extended for literature review generation. Human evaluation results show that some machine-generated summaries are comparable to human-written reviews, while revealing the challenges of automatic literature review generation such as hallucinations and a lack of detailed information. Our dataset and code are available at https://github.com/tetsu9923/SciReviewGen.

* ACL findings 2023 (to be appeared). arXiv admin note: text overlap with arXiv:1810.04020 by other authors

Via

Access Paper or Ask Questions

Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance

Jun 15, 2021

Masaru Isonuma, Junichiro Mori, Danushka Bollegala, Ichiro Sakata

Figure 1 for Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance

Figure 2 for Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance

Figure 3 for Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance

Figure 4 for Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance

Abstract:This paper presents a novel unsupervised abstractive summarization method for opinionated texts. While the basic variational autoencoder-based models assume a unimodal Gaussian prior for the latent code of sentences, we alternate it with a recursive Gaussian mixture, where each mixture component corresponds to the latent code of a topic sentence and is mixed by a tree-structured topic distribution. By decoding each Gaussian component, we generate sentences with tree-structured topic guidance, where the root sentence conveys generic content, and the leaf sentences describe specific topics. Experimental results demonstrate that the generated topic sentences are appropriate as a summary of opinionated texts, which are more informative and cover more input contents than those generated by the recent unsupervised summarization model (Bra\v{z}inskas et al., 2020). Furthermore, we demonstrate that the variance of latent Gaussians represents the granularity of sentences, analogous to Gaussian word embedding (Vilnis and McCallum, 2015).

* accepted to TACL, pre-MIT Press publication version

Via

Access Paper or Ask Questions

Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking

Jun 13, 2019

Masaru Isonuma, Junichiro Mori, Ichiro Sakata

Figure 1 for Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking

Figure 2 for Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking

Figure 3 for Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking

Figure 4 for Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking

Abstract:This paper focuses on the end-to-end abstractive summarization of a single product review without supervision. We assume that a review can be described as a discourse tree, in which the summary is the root, and the child sentences explain their parent in detail. By recursively estimating a parent from its children, our model learns the latent discourse tree without an external parser and generates a concise summary. We also introduce an architecture that ranks the importance of each sentence on the tree to support summary generation focusing on the main review point. The experimental results demonstrate that our model is competitive with or outperforms other unsupervised approaches. In particular, for relatively long reviews, it achieves a competitive or better performance than supervised models. The induced tree shows that the child sentences provide additional information about their parent, and the generated summary abstracts the entire review.

* 13 pages, ACL 2019 (long paper)

Via

Access Paper or Ask Questions