Picture for Haonan Li

Haonan Li

SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning

Add code
Feb 19, 2025
Viaarxiv icon

RuozhiBench: Evaluating LLMs with Logical Fallacies and Misleading Premises

Add code
Feb 18, 2025
Viaarxiv icon

LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch

Add code
Jan 16, 2025
Viaarxiv icon

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Add code
Dec 24, 2024
Figure 1 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 2 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 3 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Figure 4 for Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Viaarxiv icon

Bi-Mamba: Towards Accurate 1-Bit State Space Models

Add code
Nov 18, 2024
Viaarxiv icon

ToolGen: Unified Tool Retrieval and Calling via Generation

Add code
Oct 04, 2024
Figure 1 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 2 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 3 for ToolGen: Unified Tool Retrieval and Calling via Generation
Figure 4 for ToolGen: Unified Tool Retrieval and Calling via Generation
Viaarxiv icon

Loki: An Open-Source Tool for Fact Verification

Add code
Oct 02, 2024
Figure 1 for Loki: An Open-Source Tool for Fact Verification
Figure 2 for Loki: An Open-Source Tool for Fact Verification
Figure 3 for Loki: An Open-Source Tool for Fact Verification
Figure 4 for Loki: An Open-Source Tool for Fact Verification
Viaarxiv icon

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Add code
Jun 28, 2024
Figure 1 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 2 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 3 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 4 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Viaarxiv icon

Lessons from the Trenches on Reproducible Evaluation of Language Models

Add code
May 23, 2024
Figure 1 for Lessons from the Trenches on Reproducible Evaluation of Language Models
Figure 2 for Lessons from the Trenches on Reproducible Evaluation of Language Models
Figure 3 for Lessons from the Trenches on Reproducible Evaluation of Language Models
Figure 4 for Lessons from the Trenches on Reproducible Evaluation of Language Models
Viaarxiv icon

3D Hand Mesh Recovery from Monocular RGB in Camera Space

Add code
May 12, 2024
Viaarxiv icon