Picture for Timothy Baldwin

Timothy Baldwin

Analysis of Emotion in Rumour Threads on Social Media

Add code
Feb 23, 2025
Viaarxiv icon

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

Add code
Feb 21, 2025
Viaarxiv icon

Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models

Add code
Feb 20, 2025
Viaarxiv icon

Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts

Add code
Feb 19, 2025
Viaarxiv icon

SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning

Add code
Feb 19, 2025
Viaarxiv icon

RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises

Add code
Feb 18, 2025
Viaarxiv icon

Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models

Add code
Feb 17, 2025
Viaarxiv icon

Large Language Models Are Human-Like Internally

Add code
Feb 03, 2025
Figure 1 for Large Language Models Are Human-Like Internally
Figure 2 for Large Language Models Are Human-Like Internally
Figure 3 for Large Language Models Are Human-Like Internally
Figure 4 for Large Language Models Are Human-Like Internally
Viaarxiv icon

Training and Evaluating with Human Label Variation: An Empirical Study

Add code
Feb 03, 2025
Viaarxiv icon

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Add code
Dec 24, 2024
Figure 1 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 2 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 3 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 4 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Viaarxiv icon