Picture for Zhihan Zhang

Zhihan Zhang

MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

Add code
Oct 18, 2024
Viaarxiv icon

Enhancing Mathematical Reasoning in LLMs by Stepwise Correction

Add code
Oct 16, 2024
Figure 1 for Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Figure 2 for Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Figure 3 for Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Figure 4 for Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Viaarxiv icon

On the sample complexity of purity and inner product estimation

Add code
Oct 16, 2024
Viaarxiv icon

TOWER: Tree Organized Weighting for Evaluating Complex Instructions

Add code
Oct 08, 2024
Viaarxiv icon

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Add code
Oct 02, 2024
Viaarxiv icon

Stabilizer bootstrapping: A recipe for efficient agnostic tomography and magic estimation

Add code
Aug 13, 2024
Viaarxiv icon

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Add code
Jun 26, 2024
Viaarxiv icon

Hallucination Mitigation Prompts Long-term Video Understanding

Add code
Jun 17, 2024
Figure 1 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 2 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 3 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 4 for Hallucination Mitigation Prompts Long-term Video Understanding
Viaarxiv icon

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Add code
Jun 17, 2024
Figure 1 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Figure 2 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Figure 3 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Figure 4 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Viaarxiv icon

Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding

Add code
Jun 04, 2024
Viaarxiv icon