Picture for Shi-Xiong Zhang

Shi-Xiong Zhang

MemGym: a Long-Horizon Memory Environment for LLM Agents

Add code
May 20, 2026
Viaarxiv icon

Your Model Diversity, Not Method, Determines Reasoning Strategy

Add code
Apr 12, 2026
Viaarxiv icon

DIAL-SUMMER: A Structured Evaluation Framework of Hierarchical Errors in Dialogue Summaries

Add code
Feb 08, 2026
Viaarxiv icon

Routing with Generated Data: Annotation-Free LLM Skill Estimation and Expert Selection

Add code
Jan 14, 2026
Viaarxiv icon

Lessons from the Field: An Adaptable Lifecycle Approach to Applied Dialogue Summarization

Add code
Jan 13, 2026
Viaarxiv icon

Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models

Add code
Dec 16, 2025
Viaarxiv icon

Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs

Add code
Nov 13, 2025
Viaarxiv icon

T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning

Add code
May 22, 2025
Figure 1 for T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
Figure 2 for T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
Figure 3 for T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
Figure 4 for T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
Viaarxiv icon

Continual Pre-training of MoEs: How robust is your router?

Add code
Mar 06, 2025
Figure 1 for Continual Pre-training of MoEs: How robust is your router?
Figure 2 for Continual Pre-training of MoEs: How robust is your router?
Figure 3 for Continual Pre-training of MoEs: How robust is your router?
Figure 4 for Continual Pre-training of MoEs: How robust is your router?
Viaarxiv icon

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Add code
Oct 16, 2024
Figure 1 for WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Figure 2 for WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Figure 3 for WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Figure 4 for WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Viaarxiv icon