Picture for Timothy Baldwin

Timothy Baldwin

Arabic Dataset for LLM Safeguard Evaluation

Add code
Oct 22, 2024
Viaarxiv icon

ToolGen: Unified Tool Retrieval and Calling via Generation

Add code
Oct 04, 2024
Figure 1 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 2 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 3 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 4 for ToolGen: Unified Tool Retrieval and Calling via Generation
Viaarxiv icon

Loki: An Open-Source Tool for Fact Verification

Add code
Oct 02, 2024
Figure 1 for Loki: An Open-Source Tool for Fact Verification
Figure 2 for Loki: An Open-Source Tool for Fact Verification
Figure 3 for Loki: An Open-Source Tool for Fact Verification
Figure 4 for Loki: An Open-Source Tool for Fact Verification
Viaarxiv icon

Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models

Add code
Aug 20, 2024
Figure 1 for Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models
Figure 2 for Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models
Figure 3 for Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models
Figure 4 for Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models
Viaarxiv icon

To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction

Add code
Aug 05, 2024
Viaarxiv icon

Inference-Time Selective Debiasing

Add code
Jul 27, 2024
Viaarxiv icon

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Add code
Jun 28, 2024
Figure 1 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 2 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 3 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 4 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Viaarxiv icon

Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph

Add code
Jun 21, 2024
Figure 1 for Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Figure 2 for Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Figure 3 for Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Figure 4 for Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Viaarxiv icon

Evaluating Transparency of Machine Generated Fact Checking Explanations

Add code
Jun 18, 2024
Viaarxiv icon

Revisiting subword tokenization: A case study on affixal negation in large language models

Add code
Apr 04, 2024
Figure 1 for Revisiting subword tokenization: A case study on affixal negation in large language models
Figure 2 for Revisiting subword tokenization: A case study on affixal negation in large language models
Figure 3 for Revisiting subword tokenization: A case study on affixal negation in large language models
Figure 4 for Revisiting subword tokenization: A case study on affixal negation in large language models
Viaarxiv icon