Picture for Kristen Marie Johnson

Kristen Marie Johnson

Revealing the Pragmatic Dilemma for Moral Reasoning Acquisition in Language Models

Add code
Feb 25, 2025
Viaarxiv icon

No Free Lunch for Defending Against Prefilling Attack by In-Context Learning

Add code
Dec 13, 2024
Viaarxiv icon

Smaller Large Language Models Can Do Moral Self-Correction

Add code
Oct 30, 2024
Figure 1 for Smaller Large Language Models Can Do Moral Self-Correction
Figure 2 for Smaller Large Language Models Can Do Moral Self-Correction
Figure 3 for Smaller Large Language Models Can Do Moral Self-Correction
Figure 4 for Smaller Large Language Models Can Do Moral Self-Correction
Viaarxiv icon

Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction

Add code
Oct 27, 2024
Figure 1 for Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction
Figure 2 for Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction
Figure 3 for Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction
Figure 4 for Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction
Viaarxiv icon

Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis

Add code
Jul 21, 2024
Figure 1 for Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Figure 2 for Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Figure 3 for Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Figure 4 for Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Viaarxiv icon

PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent

Add code
Oct 26, 2023
Figure 1 for PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Figure 2 for PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Figure 3 for PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Figure 4 for PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Viaarxiv icon