Picture for Jayanth Srinivasa

Jayanth Srinivasa

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench

Add code
Aug 28, 2025
Viaarxiv icon

EXP-Bench: Can AI Conduct AI Research Experiments?

Add code
May 30, 2025
Viaarxiv icon

An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems

Add code
May 23, 2025
Viaarxiv icon

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Add code
May 22, 2025
Viaarxiv icon

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

Add code
May 18, 2025
Viaarxiv icon

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Add code
Apr 09, 2025
Figure 1 for SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Figure 2 for SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Figure 3 for SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Figure 4 for SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Viaarxiv icon

Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval

Add code
Mar 12, 2025
Viaarxiv icon

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Add code
Feb 26, 2025
Viaarxiv icon

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Add code
Feb 18, 2025
Viaarxiv icon

Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning

Add code
Feb 08, 2025
Figure 1 for Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning
Figure 2 for Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning
Figure 3 for Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning
Figure 4 for Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning
Viaarxiv icon