Picture for Xiangru Tang

Xiangru Tang

LocAgent: Graph-Guided LLM Agents for Code Localization

Add code
Mar 12, 2025
Viaarxiv icon

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

Add code
Mar 10, 2025
Viaarxiv icon

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

Add code
Mar 03, 2025
Viaarxiv icon

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Add code
Jan 21, 2025
Figure 1 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 2 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 3 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 4 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Viaarxiv icon

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning

Add code
Jan 11, 2025
Viaarxiv icon

ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain

Add code
Nov 23, 2024
Viaarxiv icon

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Add code
Nov 08, 2024
Figure 1 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 2 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 3 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 4 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Viaarxiv icon

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Add code
Jul 23, 2024
Figure 1 for OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Figure 2 for OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Figure 3 for OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Figure 4 for OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Viaarxiv icon

Step-Back Profiling: Distilling User History for Personalized Scientific Writing

Add code
Jun 20, 2024
Viaarxiv icon

Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation

Add code
Jun 20, 2024
Figure 1 for Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Figure 2 for Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Figure 3 for Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Viaarxiv icon