Picture for Hjalmar Wijk

Hjalmar Wijk

RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts

Add code
Nov 22, 2024
Viaarxiv icon

Evaluating Language-Model Agents on Realistic Autonomous Tasks

Add code
Jan 04, 2024
Figure 1 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 2 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 3 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 4 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Viaarxiv icon

Robustness Guarantees for Credal Bayesian Networks via Constraint Relaxation over Probabilistic Circuits

Add code
May 11, 2022
Figure 1 for Robustness Guarantees for Credal Bayesian Networks via Constraint Relaxation over Probabilistic Circuits
Figure 2 for Robustness Guarantees for Credal Bayesian Networks via Constraint Relaxation over Probabilistic Circuits
Figure 3 for Robustness Guarantees for Credal Bayesian Networks via Constraint Relaxation over Probabilistic Circuits
Figure 4 for Robustness Guarantees for Credal Bayesian Networks via Constraint Relaxation over Probabilistic Circuits
Viaarxiv icon

Shielding Atari Games with Bounded Prescience

Add code
Jan 22, 2021
Figure 1 for Shielding Atari Games with Bounded Prescience
Figure 2 for Shielding Atari Games with Bounded Prescience
Figure 3 for Shielding Atari Games with Bounded Prescience
Figure 4 for Shielding Atari Games with Bounded Prescience
Viaarxiv icon