Picture for Timothy Baldwin

Timothy Baldwin

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Add code
Dec 24, 2024
Viaarxiv icon

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Add code
Dec 10, 2024
Viaarxiv icon

Arabic Dataset for LLM Safeguard Evaluation

Add code
Oct 22, 2024
Viaarxiv icon

ToolGen: Unified Tool Retrieval and Calling via Generation

Add code
Oct 04, 2024
Figure 1 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 2 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 3 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 4 for ToolGen: Unified Tool Retrieval and Calling via Generation
Viaarxiv icon

Loki: An Open-Source Tool for Fact Verification

Add code
Oct 02, 2024
Figure 1 for Loki: An Open-Source Tool for Fact Verification
Figure 2 for Loki: An Open-Source Tool for Fact Verification
Figure 3 for Loki: An Open-Source Tool for Fact Verification
Figure 4 for Loki: An Open-Source Tool for Fact Verification
Viaarxiv icon

Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models

Add code
Aug 20, 2024
Figure 1 for Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models
Figure 2 for Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models
Figure 3 for Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models
Figure 4 for Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models
Viaarxiv icon

To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction

Add code
Aug 05, 2024
Viaarxiv icon

Inference-Time Selective Debiasing

Add code
Jul 27, 2024
Figure 1 for Inference-Time Selective Debiasing
Figure 2 for Inference-Time Selective Debiasing
Figure 3 for Inference-Time Selective Debiasing
Figure 4 for Inference-Time Selective Debiasing
Viaarxiv icon

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Add code
Jun 28, 2024
Figure 1 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 2 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 3 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 4 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Viaarxiv icon

Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph

Add code
Jun 21, 2024
Figure 1 for Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Figure 2 for Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Figure 3 for Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Figure 4 for Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Viaarxiv icon