Picture for Shahin Honarvar

Shahin Honarvar

Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations

Add code
Feb 05, 2026
Viaarxiv icon

Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study

Add code
Apr 03, 2025
Viaarxiv icon

Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code

Add code
Jan 14, 2024
Figure 1 for Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Figure 2 for Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Figure 3 for Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Figure 4 for Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Viaarxiv icon