Picture for Chirag Shah

Chirag Shah

Thinking Ahead: Prospection-Guided Retrieval of Memory with Language Models

Add code
May 13, 2026
Viaarxiv icon

Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

Add code
May 13, 2026
Viaarxiv icon

iAgentBench: Benchmarking Sensemaking Capabilities of Information-Seeking Agents on High-Traffic Topics

Add code
Mar 04, 2026
Viaarxiv icon

Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight

Add code
Feb 16, 2026
Viaarxiv icon

ClaimDB: A Fact Verification Benchmark over Large Structured Data

Add code
Jan 21, 2026
Viaarxiv icon

The PROPER Approach to Proactivity: Benchmarking and Advancing Knowledge Gap Navigation

Add code
Jan 16, 2026
Viaarxiv icon

I Think, Therefore I Am Under-Qualified? A Benchmark for Evaluating Linguistic Shibboleth Detection in LLM Hiring Evaluations

Add code
Aug 06, 2025
Viaarxiv icon

LLM-Driven Usefulness Judgment for Web Search Evaluation

Add code
Apr 19, 2025
Figure 1 for LLM-Driven Usefulness Judgment for Web Search Evaluation
Figure 2 for LLM-Driven Usefulness Judgment for Web Search Evaluation
Figure 3 for LLM-Driven Usefulness Judgment for Web Search Evaluation
Figure 4 for LLM-Driven Usefulness Judgment for Web Search Evaluation
Viaarxiv icon

LLM-Driven Usefulness Labeling for IR Evaluation

Add code
Mar 12, 2025
Viaarxiv icon

Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets

Add code
Mar 06, 2025
Figure 1 for Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets
Figure 2 for Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets
Figure 3 for Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets
Figure 4 for Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets
Viaarxiv icon