Picture for Timothy Baldwin

Timothy Baldwin

SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning

Add code
Feb 19, 2025
Viaarxiv icon

Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts

Add code
Feb 19, 2025
Viaarxiv icon

RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises

Add code
Feb 18, 2025
Viaarxiv icon

Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models

Add code
Feb 17, 2025
Viaarxiv icon

Training and Evaluating with Human Label Variation: An Empirical Study

Add code
Feb 03, 2025
Viaarxiv icon

Large Language Models Are Human-Like Internally

Add code
Feb 03, 2025
Figure 1 for Large Language Models Are Human-Like Internally
Figure 2 for Large Language Models Are Human-Like Internally
Figure 3 for Large Language Models Are Human-Like Internally
Figure 4 for Large Language Models Are Human-Like Internally
Viaarxiv icon

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Add code
Dec 24, 2024
Figure 1 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 2 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 3 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 4 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Viaarxiv icon

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Add code
Dec 10, 2024
Viaarxiv icon

Arabic Dataset for LLM Safeguard Evaluation

Add code
Oct 22, 2024
Viaarxiv icon

ToolGen: Unified Tool Retrieval and Calling via Generation

Add code
Oct 04, 2024
Figure 1 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 2 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 3 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 4 for ToolGen: Unified Tool Retrieval and Calling via Generation
Viaarxiv icon