Picture for Soham Dan

Soham Dan

Are LLMs Ready for Practical Adoption for Assertion Generation?

Add code
Feb 28, 2025
Viaarxiv icon

Few-shot Policy (de)composition in Conversational Question Answering

Add code
Jan 20, 2025
Figure 1 for Few-shot Policy (de)composition in Conversational Question Answering
Figure 2 for Few-shot Policy (de)composition in Conversational Question Answering
Figure 3 for Few-shot Policy (de)composition in Conversational Question Answering
Figure 4 for Few-shot Policy (de)composition in Conversational Question Answering
Viaarxiv icon

IOLBENCH: Benchmarking LLMs on Linguistic Reasoning

Add code
Jan 08, 2025
Figure 1 for IOLBENCH: Benchmarking LLMs on Linguistic Reasoning
Figure 2 for IOLBENCH: Benchmarking LLMs on Linguistic Reasoning
Figure 3 for IOLBENCH: Benchmarking LLMs on Linguistic Reasoning
Viaarxiv icon

NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation

Add code
Dec 17, 2024
Viaarxiv icon

Benchmarking LLM Guardrails in Handling Multilingual Toxicity

Add code
Oct 29, 2024
Viaarxiv icon

Large Language Models can be Strong Self-Detoxifiers

Add code
Oct 04, 2024
Viaarxiv icon

Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models

Add code
Aug 19, 2024
Viaarxiv icon

Needle in the Haystack for Memory Based Large Language Models

Add code
Jul 01, 2024
Figure 1 for Needle in the Haystack for Memory Based Large Language Models
Figure 2 for Needle in the Haystack for Memory Based Large Language Models
Figure 3 for Needle in the Haystack for Memory Based Large Language Models
Viaarxiv icon

AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation

Add code
Jun 26, 2024
Viaarxiv icon

CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design

Add code
Jun 25, 2024
Viaarxiv icon