Picture for Max Hasin

Max Hasin

HCAST: Human-Calibrated Autonomy Software Tasks

Add code
Mar 21, 2025
Viaarxiv icon

Measuring AI Ability to Complete Long Tasks

Add code
Mar 18, 2025
Viaarxiv icon

Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

Add code
Apr 08, 2024
Figure 1 for Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Figure 2 for Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Figure 3 for Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Figure 4 for Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Viaarxiv icon

Evaluating Language-Model Agents on Realistic Autonomous Tasks

Add code
Jan 04, 2024
Figure 1 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 2 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 3 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Figure 4 for Evaluating Language-Model Agents on Realistic Autonomous Tasks
Viaarxiv icon