Picture for Esben Kran

Esben Kran

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Add code
Oct 11, 2024
Viaarxiv icon

Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities

Add code
Oct 10, 2024
Figure 1 for Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Figure 2 for Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Figure 3 for Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Figure 4 for Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Viaarxiv icon

DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

Add code
Oct 03, 2023
Figure 1 for DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Figure 2 for DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Figure 3 for DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Viaarxiv icon

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

Add code
Jun 03, 2023
Figure 1 for Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Figure 2 for Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Figure 3 for Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Figure 4 for Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Viaarxiv icon

Neuron to Graph: Interpreting Language Model Neurons at Scale

Add code
May 31, 2023
Viaarxiv icon

N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models

Add code
Apr 22, 2023
Viaarxiv icon