Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qinghua Zhao

Extending Memorization Dynamics in Pythia Models from Instance-Level Insights

Jun 14, 2025

Jie Zhang, Qinghua Zhao, Lei Li, Chi-ho Lin

Abstract:Large language models have demonstrated a remarkable ability for verbatim memorization. While numerous works have explored factors influencing model memorization, the dynamic evolution memorization patterns remains underexplored. This paper presents a detailed analysis of memorization in the Pythia model family across varying scales and training steps under prefix perturbations. Using granular metrics, we examine how model architecture, data characteristics, and perturbations influence these patterns. Our findings reveal that: (1) as model scale increases, memorization expands incrementally while efficiency decreases rapidly; (2) as model scale increases, the rate of new memorization acquisition decreases while old memorization forgetting increases; (3) data characteristics (token frequency, repetition count, and uncertainty) differentially affect memorized versus non-memorized samples; and (4) prefix perturbations reduce memorization and increase generation uncertainty proportionally to perturbation strength, with low-redundancy samples showing higher vulnerability and larger models offering no additional robustness. These findings advance our understanding of memorization mechanisms, with direct implications for training optimization, privacy safeguards, and architectural improvements.

* 5 figures

Via

Access Paper or Ask Questions

Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Jan 15, 2025

Kaiyuan Zheng, Qinghua Zhao, Lei Li

Figure 1 for Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Figure 2 for Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Figure 3 for Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Figure 4 for Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Abstract:The relationship between language and thought remains an unresolved philosophical issue. Existing viewpoints can be broadly categorized into two schools: one asserting their independence, and another arguing that language constrains thought. In the context of large language models, this debate raises a crucial question: Does a language model's grasp of semantic meaning depend on thought processes? To explore this issue, we investigate whether reasoning techniques can facilitate semantic understanding. Specifically, we conceptualize thought as reasoning, employ chain-of-thought prompting as a reasoning technique, and examine its impact on sentiment analysis tasks. The experiments show that chain-of-thought has a minimal impact on sentiment analysis tasks. Both the standard and chain-of-thought prompts focus on aspect terms rather than sentiment in the generated content. Furthermore, counterfactual experiments reveal that the model's handling of sentiment tasks primarily depends on information from demonstrations. The experimental results support the first viewpoint.

Via

Access Paper or Ask Questions

Word Order's Impacts: Insights from Reordering and Generation Analysis

Mar 18, 2024

Qinghua Zhao, Jiaang Li, Lei Li, Zenghui Zhou, Junfeng Liu

Abstract:Existing works have studied the impacts of the order of words within natural text. They usually analyze it by destroying the original order of words to create a scrambled sequence, and then comparing the models' performance between the original and scrambled sequences. The experimental results demonstrate marginal drops. Considering this findings, different hypothesis about word order is proposed, including ``the order of words is redundant with lexical semantics'', and ``models do not rely on word order''. In this paper, we revisit the aforementioned hypotheses by adding a order reconstruction perspective, and selecting datasets of different spectrum. Specifically, we first select four different datasets, and then design order reconstruction and continuing generation tasks. Empirical findings support that ChatGPT relies on word order to infer, but cannot support or negate the redundancy relations between word order lexical semantics.

Via

Access Paper or Ask Questions

ROME: Memorization Insights from Text, Probability and Hidden State in Large Language Models

Mar 04, 2024

Bo Li, Qinghua Zhao, Lijie Wen

Abstract:Probing the memorization of large language models holds significant importance. Previous works have established metrics for quantifying memorization, explored various influencing factors, such as data duplication, model size, and prompt length, and evaluated memorization by comparing model outputs with training corpora. However, the training corpora are of enormous scale and its pre-processing is time-consuming. To explore memorization without accessing training data, we propose a novel approach, named ROME, wherein memorization is explored by comparing disparities across memorized and non-memorized. Specifically, models firstly categorize the selected samples into memorized and non-memorized groups, and then comparing the demonstrations in the two groups from the insights of text, probability, and hidden state. Experimental findings show the disparities in factors including word length, part-of-speech, word frequency, mean and variance, just to name a few.

* Submitted to ACL, 2024

Via

Access Paper or Ask Questions

Word Order and World Knowledge

Mar 01, 2024

Qinghua Zhao, Vinit Ravishankar, Nicolas Garneau, Anders Søgaard

Figure 1 for Word Order and World Knowledge

Figure 2 for Word Order and World Knowledge

Figure 3 for Word Order and World Knowledge

Figure 4 for Word Order and World Knowledge

Abstract:Word order is an important concept in natural language, and in this work, we study how word order affects the induction of world knowledge from raw text using language models. We use word analogies to probe for such knowledge. Specifically, in addition to the natural word order, we first respectively extract texts of six fixed word orders from five languages and then pretrain the language models on these texts. Finally, we analyze the experimental results of the fixed word orders on word analogies and show that i) certain fixed word orders consistently outperform or underperform others, though the specifics vary across languages, and ii) the Wov2Lex hypothesis is not hold in pre-trained language models, and the natural word order typically yields mediocre results. The source code will be made publicly available at https://github.com/lshowway/probing_by_analogy.

Via

Access Paper or Ask Questions

Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect

Jun 04, 2023

Yiying Hu, Hui Feng, Qinghua Zhao, Aijun Li

Figure 1 for Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect

Figure 2 for Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect

Figure 3 for Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect

Figure 4 for Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect

Abstract:Few studies have worked on the effects of tonal coarticulation and prosodic positions on the low rising tone in Xiamen Dialect. This study addressed such an issue. To do so, a new method, the Tonal Contour Analysis in Tonal Triangle, was proposed to measure the subtle curvature of the tonal contour. Findings are as follows: (1) The low rising tone in Xiamen Dialect has a tendency towards the falling-rising tone, which is significantly affected by the tonal coarticulation and prosodic positions. (2) The low rising tone presents as a falling-rising tone when preceded by a tone with a high offset, and as a low rising tone when preceded by a tone that ends up low. (3) The curvature of the low rising tone is greatest in the sentence-initial position, and is positively correlated to its own duration.

* To be published in InterSpeech 2023

Via

Access Paper or Ask Questions

Ered: Enhanced Text Representations with Entities and Descriptions

Aug 18, 2022

Qinghua Zhao, Shuai Ma, Yuxuan Lei

Figure 1 for Ered: Enhanced Text Representations with Entities and Descriptions

Figure 2 for Ered: Enhanced Text Representations with Entities and Descriptions

Figure 3 for Ered: Enhanced Text Representations with Entities and Descriptions

Figure 4 for Ered: Enhanced Text Representations with Entities and Descriptions

Abstract:External knowledge,e.g., entities and entity descriptions, can help humans understand texts. Many works have been explored to include external knowledge in the pre-trained models. These methods, generally, design pre-training tasks and implicitly introduce knowledge by updating model weights, alternatively, use it straightforwardly together with the original text. Though effective, there are some limitations. On the one hand, it is implicit and only model weights are paid attention to, the pre-trained entity embeddings are ignored. On the other hand, entity descriptions may be lengthy, and inputting into the model together with the original text may distract the model's attention. This paper aims to explicitly include both entities and entity descriptions in the fine-tuning stage. First, the pre-trained entity embeddings are fused with the original text representation and updated by the backbone model layer by layer. Second, descriptions are represented by the knowledge module outside the backbone model, and each knowledge layer is selectively connected to one backbone layer for fusing. Third, two knowledge-related auxiliary tasks, i.e., entity/description enhancement and entity enhancement/pollution task, are designed to smooth the semantic gaps among evolved representations. We conducted experiments on four knowledge-oriented tasks and two common tasks, and the results achieved new state-of-the-art on several datasets. Besides, we conduct an ablation study to show that each module in our method is necessary. The code is available at https://github.com/lshowway/Ered.

Via

Access Paper or Ask Questions

TraceNet: Tracing and Locating the Key Elements in Sentiment Analysis

Feb 28, 2022

Qinghua Zhao, Shuai Ma

Figure 1 for TraceNet: Tracing and Locating the Key Elements in Sentiment Analysis

Figure 2 for TraceNet: Tracing and Locating the Key Elements in Sentiment Analysis

Figure 3 for TraceNet: Tracing and Locating the Key Elements in Sentiment Analysis

Figure 4 for TraceNet: Tracing and Locating the Key Elements in Sentiment Analysis

Abstract:In this paper, we study sentiment analysis task where the outcomes are mainly contributed by a few key elements of the inputs. Motivated by the two-streams hypothesis, we propose a neural architecture, named TraceNet, to address this type of task. It not only learns discriminative representations for the target task via its encoders, but also traces key elements at the same time via its locators. In TraceNet, both encoders and locators are organized in a layer-wise manner, and a smoothness regularization is employed between adjacent encoder-locator combinations. Moreover, a sparsity constraints are enforced on locators for tracing purposes and items are proactively masked according to the item weights output by locators.A major advantage of TraceNet is that the outcomes are easier to understand, since the most responsible parts of inputs are identified. Also, under the guidance of locators, it is more robust to attacks due to its focus on key elements and the proactive masking training strategy. Experimental results show its effectiveness for sentiment classification. Moreover, we provide several case studies to demonstrate its robustness and interpretability.

Via

Access Paper or Ask Questions

KESA: A Knowledge Enhanced Approach For Sentiment Analysis

Feb 24, 2022

Qinghua Zhao, Shuai Ma, Shuo Ren

Figure 1 for KESA: A Knowledge Enhanced Approach For Sentiment Analysis

Figure 2 for KESA: A Knowledge Enhanced Approach For Sentiment Analysis

Figure 3 for KESA: A Knowledge Enhanced Approach For Sentiment Analysis

Figure 4 for KESA: A Knowledge Enhanced Approach For Sentiment Analysis

Abstract:Though some recent works focus on injecting sentiment knowledge into pre-trained language models, they usually design mask and reconstruction tasks in the post-training phase. In this paper, we aim to benefit from sentiment knowledge in a lighter way. To achieve this goal, we study sentence-level sentiment analysis and, correspondingly, propose two sentiment-aware auxiliary tasks named sentiment word cloze and conditional sentiment prediction. The first task learns to select the correct sentiment words within the input, given the overall sentiment polarity as prior knowledge. On the contrary, the second task predicts the overall sentiment polarity given the sentiment polarity of the word as prior knowledge. In addition, two kinds of label combination methods are investigated to unify multiple types of labels in each task. We argue that more information can promote the models to learn more profound semantic representation. We implement it in a straightforward way to verify this hypothesis. The experimental results demonstrate that our approach consistently outperforms pre-trained models and is additive to existing knowledge-enhanced post-trained models. The code and data are released at https://github.com/lshowway/KESA.

Via

Access Paper or Ask Questions

D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

Dec 19, 2021

Qinghua Zhao, Xu Chen, Hui Zhang, Shuai Ma

Figure 1 for D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

Figure 2 for D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

Figure 3 for D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

Figure 4 for D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

Abstract:News recommendation is an effective information dissemination solution in modern society. While recent years have witnessed many promising news recommendation models, they mostly capture the user-news interactions on the document-level in a static manner. However, in real-world scenarios, the news can be quite complex and diverse, blindly squeezing all the contents into an embedding vector can be less effective in extracting information compatible with the personalized preference of the users. In addition, user preferences in the news recommendation scenario can be highly dynamic, and a tailored dynamic mechanism should be designed for better recommendation performance. In this paper, we propose a novel dynamic news recommender model. For better understanding the news content, we leverage the attention mechanism to represent the news from the sentence-, element- and document-levels, respectively. For capturing users' dynamic preferences, the continuous time information is seamlessly incorporated into the computing of the attention weights. More specifically, we design a hierarchical attention network, where the lower layer learns the importance of different sentences and elements, and the upper layer captures the correlations between the previously interacted and the target news. To comprehensively model the dynamic characters, we firstly enhance the traditional attention mechanism by incorporating both absolute and relative time information, and then we propose a dynamic negative sampling method to optimize the users' implicit feedback. We conduct extensive experiments based on three real-world datasets to demonstrate our model's effectiveness. Our source code and pre-trained representations are available at https://github.com/lshowway/D-HAN.

Via

Access Paper or Ask Questions