Picture for Zhuohan Xie

Zhuohan Xie

Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking

Add code
Feb 27, 2026
Viaarxiv icon

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Add code
Feb 19, 2026
Viaarxiv icon

The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems

Add code
Feb 11, 2026
Viaarxiv icon

RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid?

Add code
Feb 06, 2026
Viaarxiv icon

A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding

Add code
Jan 13, 2026
Viaarxiv icon

FinCARDS: Card-Based Analyst Reranking for Financial Document Question Answering

Add code
Jan 11, 2026
Viaarxiv icon

FRaN-X: FRaming and Narratives-eXplorer

Add code
Jul 09, 2025
Viaarxiv icon

VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration

Add code
May 26, 2025
Figure 1 for VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Figure 2 for VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Figure 3 for VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Figure 4 for VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Viaarxiv icon

LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs

Add code
May 17, 2025
Viaarxiv icon

A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs

Add code
May 13, 2025
Viaarxiv icon