Picture for Benedikt Stroebl

Benedikt Stroebl

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Add code
May 23, 2025
Figure 1 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 2 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 3 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 4 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Viaarxiv icon

Localized Cultural Knowledge is Conserved and Controllable in Large Language Models

Add code
Apr 14, 2025
Figure 1 for Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
Figure 2 for Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
Figure 3 for Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
Figure 4 for Localized Cultural Knowledge is Conserved and Controllable in Large Language Models
Viaarxiv icon

Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers

Add code
Dec 02, 2024
Figure 1 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 2 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 3 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 4 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Viaarxiv icon

Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers

Add code
Nov 26, 2024
Figure 1 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 2 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 3 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 4 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Viaarxiv icon

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Add code
Sep 17, 2024
Figure 1 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 2 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 3 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 4 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Viaarxiv icon

AI Agents That Matter

Add code
Jul 01, 2024
Figure 1 for AI Agents That Matter
Figure 2 for AI Agents That Matter
Figure 3 for AI Agents That Matter
Viaarxiv icon