Picture for Ansh Radhakrishnan

Ansh Radhakrishnan

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

Add code
Nov 26, 2024
Figure 1 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 2 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 3 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Figure 4 for Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Viaarxiv icon

Debating with More Persuasive LLMs Leads to More Truthful Answers

Add code
Feb 15, 2024
Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Add code
Jul 25, 2023
Figure 1 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 2 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 3 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 4 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Viaarxiv icon

Measuring Faithfulness in Chain-of-Thought Reasoning

Add code
Jul 17, 2023
Figure 1 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 2 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 3 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 4 for Measuring Faithfulness in Chain-of-Thought Reasoning
Viaarxiv icon