Picture for Guangliang Liu

Guangliang Liu

Revealing the Pragmatic Dilemma for Moral Reasoning Acquisition in Language Models

Add code
Feb 25, 2025
Viaarxiv icon

No Free Lunch for Defending Against Prefilling Attack by In-Context Learning

Add code
Dec 13, 2024
Viaarxiv icon

Smaller Large Language Models Can Do Moral Self-Correction

Add code
Oct 30, 2024
Figure 1 for Smaller Large Language Models Can Do Moral Self-Correction
Figure 2 for Smaller Large Language Models Can Do Moral Self-Correction
Figure 3 for Smaller Large Language Models Can Do Moral Self-Correction
Figure 4 for Smaller Large Language Models Can Do Moral Self-Correction
Viaarxiv icon

Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction

Add code
Oct 27, 2024
Figure 1 for Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction
Figure 2 for Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction
Figure 3 for Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction
Figure 4 for Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction
Viaarxiv icon

Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis

Add code
Jul 21, 2024
Figure 1 for Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Figure 2 for Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Figure 3 for Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Figure 4 for Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Viaarxiv icon

Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness

Add code
Jun 06, 2024
Figure 1 for Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
Figure 2 for Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
Figure 3 for Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
Figure 4 for Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
Viaarxiv icon

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Add code
Jun 04, 2024
Viaarxiv icon

A Data Generation Perspective to the Mechanism of In-Context Learning

Add code
Feb 03, 2024
Figure 1 for A Data Generation Perspective to the Mechanism of In-Context Learning
Viaarxiv icon

PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent

Add code
Oct 26, 2023
Figure 1 for PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Figure 2 for PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Figure 3 for PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Figure 4 for PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Viaarxiv icon

Auto-tune: PAC-Bayes Optimization over Prior and Posterior for Neural Networks

Add code
May 30, 2023
Viaarxiv icon