Picture for Haonan Li

Haonan Li

ToolGen: Unified Tool Retrieval and Calling via Generation

Add code
Oct 04, 2024
Figure 1 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 2 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 3 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 4 for ToolGen: Unified Tool Retrieval and Calling via Generation
Viaarxiv icon

Loki: An Open-Source Tool for Fact Verification

Add code
Oct 02, 2024
Figure 1 for Loki: An Open-Source Tool for Fact Verification
Figure 2 for Loki: An Open-Source Tool for Fact Verification
Figure 3 for Loki: An Open-Source Tool for Fact Verification
Figure 4 for Loki: An Open-Source Tool for Fact Verification
Viaarxiv icon

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Add code
Jun 28, 2024
Figure 1 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 2 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 3 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 4 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Viaarxiv icon

Lessons from the Trenches on Reproducible Evaluation of Language Models

Add code
May 23, 2024
Viaarxiv icon

3D Hand Mesh Recovery from Monocular RGB in Camera Space

Add code
May 12, 2024
Viaarxiv icon

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Add code
Mar 31, 2024
Figure 1 for Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Figure 2 for Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Figure 3 for Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Figure 4 for Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Viaarxiv icon

EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models

Add code
Mar 15, 2024
Viaarxiv icon

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

Add code
Mar 07, 2024
Figure 1 for Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Figure 2 for Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Figure 3 for Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Figure 4 for Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Viaarxiv icon

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

Add code
Feb 20, 2024
Viaarxiv icon

A Chinese Dataset for Evaluating the Safeguards in Large Language Models

Add code
Feb 19, 2024
Viaarxiv icon