Picture for Baishakhi Ray

Baishakhi Ray

Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination

Add code
Mar 06, 2025
Viaarxiv icon

Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation

Add code
Feb 23, 2025
Viaarxiv icon

AI Software Engineer: Programming with Trust

Add code
Feb 19, 2025
Viaarxiv icon

CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation

Add code
Jan 14, 2025
Figure 1 for CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
Figure 2 for CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
Figure 3 for CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
Figure 4 for CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
Viaarxiv icon

Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection

Add code
Dec 16, 2024
Viaarxiv icon

On Mitigating Code LLM Hallucinations with API Documentation

Add code
Jul 13, 2024
Figure 1 for On Mitigating Code LLM Hallucinations with API Documentation
Figure 2 for On Mitigating Code LLM Hallucinations with API Documentation
Figure 3 for On Mitigating Code LLM Hallucinations with API Documentation
Figure 4 for On Mitigating Code LLM Hallucinations with API Documentation
Viaarxiv icon

Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems

Add code
Jul 04, 2024
Viaarxiv icon

Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies

Add code
Jun 11, 2024
Figure 1 for Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Figure 2 for Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Figure 3 for Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Figure 4 for Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Viaarxiv icon

SemCoder: Training Code Language Models with Comprehensive Semantics

Add code
Jun 03, 2024
Viaarxiv icon

Training LLMs to Better Self-Debug and Explain Code

Add code
May 28, 2024
Figure 1 for Training LLMs to Better Self-Debug and Explain Code
Figure 2 for Training LLMs to Better Self-Debug and Explain Code
Figure 3 for Training LLMs to Better Self-Debug and Explain Code
Figure 4 for Training LLMs to Better Self-Debug and Explain Code
Viaarxiv icon