Picture for Arman Cohan

Arman Cohan

MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search

Add code
Mar 26, 2025
Viaarxiv icon

Survey on Evaluation of LLM-based Agents

Add code
Mar 20, 2025
Viaarxiv icon

LocAgent: Graph-Guided LLM Agents for Code Localization

Add code
Mar 12, 2025
Viaarxiv icon

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

Add code
Mar 10, 2025
Viaarxiv icon

Investigating Generalization of One-shot LLM Steering Vectors

Add code
Feb 26, 2025
Viaarxiv icon

ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models

Add code
Feb 24, 2025
Viaarxiv icon

TESS 2: A Large-Scale Generalist Diffusion Language Model

Add code
Feb 19, 2025
Viaarxiv icon

mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval

Add code
Jan 31, 2025
Figure 1 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 2 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 3 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 4 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Viaarxiv icon

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Add code
Jan 21, 2025
Figure 1 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 2 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 3 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 4 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Viaarxiv icon

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning

Add code
Jan 11, 2025
Viaarxiv icon