Picture for Rongwu Xu

Rongwu Xu

Long$^2$RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall

Add code
Oct 31, 2024
Viaarxiv icon

Sing it, Narrate it: Quality Musical Lyrics Translation

Add code
Oct 29, 2024
Viaarxiv icon

On the Role of Attention Heads in Large Language Model Safety

Add code
Oct 17, 2024
Figure 1 for On the Role of Attention Heads in Large Language Model Safety
Figure 2 for On the Role of Attention Heads in Large Language Model Safety
Figure 3 for On the Role of Attention Heads in Large Language Model Safety
Figure 4 for On the Role of Attention Heads in Large Language Model Safety
Viaarxiv icon

DebateQA: Evaluating Question Answering on Debatable Knowledge

Add code
Aug 02, 2024
Viaarxiv icon

Course-Correction: Safety Alignment Using Synthetic Preferences

Add code
Jul 23, 2024
Figure 1 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 2 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 3 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 4 for Course-Correction: Safety Alignment Using Synthetic Preferences
Viaarxiv icon

Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

Add code
Jul 22, 2024
Figure 1 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 2 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 3 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 4 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Viaarxiv icon

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Add code
Jun 09, 2024
Viaarxiv icon

Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

Add code
May 31, 2024
Viaarxiv icon

Exploring Chinese Humor Generation: A Study on Two-Part Allegorical Sayings

Add code
Mar 16, 2024
Viaarxiv icon