Picture for Zhihan Zhang

Zhihan Zhang

Remote Sensing Semantic Segmentation Quality Assessment based on Vision Language Model

Add code
Feb 19, 2025
Viaarxiv icon

IHEval: Evaluating Language Models on Following the Instruction Hierarchy

Add code
Feb 12, 2025
Viaarxiv icon

Adaptivity can help exponentially for shadow tomography

Add code
Dec 26, 2024
Viaarxiv icon

MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

Add code
Oct 18, 2024
Figure 1 for MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
Figure 2 for MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
Figure 3 for MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
Figure 4 for MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
Viaarxiv icon

On the sample complexity of purity and inner product estimation

Add code
Oct 16, 2024
Viaarxiv icon

Enhancing Mathematical Reasoning in LLMs by Stepwise Correction

Add code
Oct 16, 2024
Figure 1 for Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Figure 2 for Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Figure 3 for Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Figure 4 for Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Viaarxiv icon

TOWER: Tree Organized Weighting for Evaluating Complex Instructions

Add code
Oct 08, 2024
Viaarxiv icon

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Add code
Oct 02, 2024
Viaarxiv icon

Stabilizer bootstrapping: A recipe for efficient agnostic tomography and magic estimation

Add code
Aug 13, 2024
Viaarxiv icon

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Add code
Jun 26, 2024
Figure 1 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 2 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 3 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 4 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Viaarxiv icon