Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiren Chen

Joint Knowledge Editing for Information Enrichment and Probability Promotion

Dec 22, 2024

Wenhang Shi, Yiren Chen, Shuqing Bian, Xinyi Zhang, Zhe Zhao, Pengfei Hu, Wei Lu, Xiaoyong Du

Abstract:Knowledge stored in large language models requires timely updates to reflect the dynamic nature of real-world information. To update the knowledge, most knowledge editing methods focus on the low layers, since recent probes into the knowledge recall process reveal that the answer information is enriched in low layers. However, these probes only and could only reveal critical recall stages for the original answers, while the goal of editing is to rectify model's prediction for the target answers. This inconsistency indicates that both the probe approaches and the associated editing methods are deficient. To mitigate the inconsistency and identify critical editing regions, we propose a contrast-based probe approach, and locate two crucial stages where the model behavior diverges between the original and target answers: Information Enrichment in low layers and Probability Promotion in high layers. Building upon the insights, we develop the Joint knowledge Editing for information Enrichment and probability Promotion (JEEP) method, which jointly edits both the low and high layers to modify the two critical recall stages. Considering the mutual interference and growing forgetting due to dual modifications, JEEP is designed to ensure that updates to distinct regions share the same objectives and are complementary. We rigorously evaluate JEEP by editing up to thousands of facts on various models, i.e., GPT-J (6B) and LLaMA (7B), and addressing diverse editing objectives, i.e., adding factual and counterfactual knowledge. In all tested scenarios, JEEP achieves best performances, validating the effectiveness of the revealings of our probe approach and the designs of our editing method. Our code and data are available at https://github.com/Eric8932/JEEP.

Via

Access Paper or Ask Questions

Create and Find Flatness: Building Flat Training Spaces in Advance for Continual Learning

Sep 20, 2023

Wenhang Shi, Yiren Chen, Zhe Zhao, Wei Lu, Kimmo Yan, Xiaoyong Du

Abstract:Catastrophic forgetting remains a critical challenge in the field of continual learning, where neural networks struggle to retain prior knowledge while assimilating new information. Most existing studies emphasize mitigating this issue only when encountering new tasks, overlooking the significance of the pre-task phase. Therefore, we shift the attention to the current task learning stage, presenting a novel framework, C&F (Create and Find Flatness), which builds a flat training space for each task in advance. Specifically, during the learning of the current task, our framework adaptively creates a flat region around the minimum in the loss landscape. Subsequently, it finds the parameters' importance to the current task based on their flatness degrees. When adapting the model to a new task, constraints are applied according to the flatness and a flat space is simultaneously prepared for the impending task. We theoretically demonstrate the consistency between the created and found flatness. In this manner, our framework not only accommodates ample parameter space for learning new tasks but also preserves the preceding knowledge of earlier tasks. Experimental results exhibit C&F's state-of-the-art performance as a standalone continual learning approach and its efficacy as a framework incorporating other methods. Our work is available at https://github.com/Eric8932/Create-and-Find-Flatness.

* 10pages, ECAI2023 conference

Via

Access Paper or Ask Questions

TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

Dec 13, 2022

Zhe Zhao, Yudong Li, Cheng Hou, Jing Zhao, Rong Tian, Weijie Liu, Yiren Chen, Ningyuan Sun, Haoyan Liu, Weiquan Mao(+16 more)

Figure 1 for TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

Figure 2 for TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

Figure 3 for TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

Figure 4 for TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

Abstract:Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.

Via

Access Paper or Ask Questions

A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

Oct 18, 2022

Kunbo Ding, Weijie Liu, Yuejian Fang, Weiquan Mao, Zhe Zhao, Tao Zhu, Haoyan Liu, Rong Tian, Yiren Chen

Figure 1 for A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

Figure 2 for A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

Figure 3 for A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

Figure 4 for A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

Abstract:Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries, which are expensive and impractical for low-resource languages. To disengage from these dependencies, researchers have explored training multilingual models on English-only resources and transferring them to low-resource languages. However, its effect is limited by the gap between embedding clusters of different languages. To address this issue, we propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss, thereby improving cross-lingual transferability. Experimental results on mBERT and XLM-R demonstrate that our method significantly outperforms previous works on the zero-shot cross-lingual text classification task and can obtain a better multilingual alignment.

* Published at COLING2022

Via

Access Paper or Ask Questions

Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Apr 13, 2021

Di Wu, Yiren Chen, Liang Ding, Dacheng Tao

Figure 1 for Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Figure 2 for Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Figure 3 for Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Figure 4 for Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Abstract:Spoken language understanding (SLU) system usually consists of various pipeline components, where each component heavily relies on the results of its upstream ones. For example, Intent detection (ID), and slot filling (SF) require its upstream automatic speech recognition (ASR) to transform the voice into text. In this case, the upstream perturbations, e.g. ASR errors, environmental noise and careless user speaking, will propagate to the ID and SF models, thus deteriorating the system performance. Therefore, the well-performing SF and ID models are expected to be noise resistant to some extent. However, existing models are trained on clean data, which causes a \textit{gap between clean data training and real-world inference.} To bridge the gap, we propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Meanwhile, we design a denoising generation model to reduce the impact of the low-quality samples. Experiments on the widely-used dataset, i.e. Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment. The source code will be released.

* Work in progress

Via

Access Paper or Ask Questions

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Mar 07, 2021

Jiangang Bai, Yujing Wang, Yiren Chen, Yaming Yang, Jing Bai, Jing Yu, Yunhai Tong

Figure 1 for Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Figure 2 for Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Figure 3 for Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Figure 4 for Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Abstract:Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information. Meanwhile, syntactic information has been proved to be crucial for the success of NLP applications. However, how to incorporate the syntax trees effectively and efficiently into pre-trained Transformers is still unsettled. In this paper, we address this problem by proposing a novel framework named Syntax-BERT. This framework works in a plug-and-play mode and is applicable to an arbitrary pre-trained checkpoint based on Transformer architecture. Experiments on various datasets of natural language understanding verify the effectiveness of syntax trees and achieve consistent improvement over multiple pre-trained models, including BERT, RoBERTa, and T5.

* EACL2021

Via

Access Paper or Ask Questions

AutoADR: Automatic Model Design for Ad Relevance

Oct 14, 2020

Yiren Chen, Yaming Yang, Hong Sun, Yujing Wang, Yu Xu, Wei Shen, Rong Zhou, Yunhai Tong, Jing Bai, Ruofei Zhang

Figure 1 for AutoADR: Automatic Model Design for Ad Relevance

Figure 2 for AutoADR: Automatic Model Design for Ad Relevance

Figure 3 for AutoADR: Automatic Model Design for Ad Relevance

Figure 4 for AutoADR: Automatic Model Design for Ad Relevance

Abstract:Large-scale pre-trained models have attracted extensive attention in the research community and shown promising results on various tasks of natural language processing. However, these pre-trained models are memory and computation intensive, hindering their deployment into industrial online systems like Ad Relevance. Meanwhile, how to design an effective yet efficient model architecture is another challenging problem in online Ad Relevance. Recently, AutoML shed new lights on architecture design, but how to integrate it with pre-trained language models remains unsettled. In this paper, we propose AutoADR (Automatic model design for AD Relevance) -- a novel end-to-end framework to address this challenge, and share our experience to ship these cutting-edge techniques into online Ad Relevance system at Microsoft Bing. Specifically, AutoADR leverages a one-shot neural architecture search algorithm to find a tailored network architecture for Ad Relevance. The search process is simultaneously guided by knowledge distillation from a large pre-trained teacher model (e.g. BERT), while taking the online serving constraints (e.g. memory and latency) into consideration. We add the model designed by AutoADR as a sub-model into the production Ad Relevance model. This additional sub-model improves the Precision-Recall AUC (PR AUC) on top of the original Ad Relevance model by 2.65X of the normalized shipping bar. More importantly, adding this automatically designed sub-model leads to a statistically significant 4.6% Bad-Ad ratio reduction in online A/B testing. This model has been shipped into Microsoft Bing Ad Relevance Production model.

* CIKM 2020

Via

Access Paper or Ask Questions

Improving BERT with Self-Supervised Attention

Apr 29, 2020

Xiaoyu Kou, Yaming Yang, Yujing Wang, Ce Zhang, Yiren Chen, Yunhai Tong, Yan Zhang, Jing Bai

Figure 1 for Improving BERT with Self-Supervised Attention

Figure 2 for Improving BERT with Self-Supervised Attention

Figure 3 for Improving BERT with Self-Supervised Attention

Figure 4 for Improving BERT with Self-Supervised Attention

Abstract:One of the most popular paradigms of applying large, pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant words in the sentences, even when they are obvious to humans, can substantially degrade the performance of these fine-tuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by "probing" the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, on a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model.

Via

Access Paper or Ask Questions

TextNAS: A Neural Architecture Search Space tailored for Text Representation

Dec 23, 2019

Yujing Wang, Yaming Yang, Yiren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, Lidong Zhou

Figure 1 for TextNAS: A Neural Architecture Search Space tailored for Text Representation

Figure 2 for TextNAS: A Neural Architecture Search Space tailored for Text Representation

Figure 3 for TextNAS: A Neural Architecture Search Space tailored for Text Representation

Figure 4 for TextNAS: A Neural Architecture Search Space tailored for Text Representation

Abstract:Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS focus on the search algorithms and pay little attention to the search space. In this paper, we argue that the search space is also an important human prior to the success of NAS in different applications. Thus, we propose a novel search space tailored for text representation. Through automatic search, the discovered network architecture outperforms state-of-the-art models on various public datasets on text classification and natural language inference tasks. Furthermore, some of the design principles found in the automatic network agree well with human intuition.

Via

Access Paper or Ask Questions