Picture for Zhixiong Zeng

Zhixiong Zeng

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Add code
Feb 05, 2026
Viaarxiv icon

Agentic Reward Modeling: Verifying GUI Agent via Online Proactive Interaction

Add code
Jan 31, 2026
Viaarxiv icon

OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Add code
Jan 29, 2026
Viaarxiv icon

MobileDreamer: Generative Sketch World Model for GUI Agent

Add code
Jan 07, 2026
Viaarxiv icon

Learning When to Look: A Disentangled Curriculum for Strategic Perception in Multimodal Reasoning

Add code
Dec 19, 2025
Viaarxiv icon

UItron: Foundational GUI Agent with Advanced Perception and Planning

Add code
Aug 29, 2025
Viaarxiv icon

DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios

Add code
Aug 01, 2025
Figure 1 for DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
Figure 2 for DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
Figure 3 for DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
Figure 4 for DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
Viaarxiv icon

ScaleTrack: Scaling and back-tracking Automated GUI Agents

Add code
May 01, 2025
Viaarxiv icon

Learning Multi-Stage Multi-Grained Semantic Embeddings for E-Commerce Search

Add code
Mar 20, 2023
Figure 1 for Learning Multi-Stage Multi-Grained Semantic Embeddings for E-Commerce Search
Figure 2 for Learning Multi-Stage Multi-Grained Semantic Embeddings for E-Commerce Search
Figure 3 for Learning Multi-Stage Multi-Grained Semantic Embeddings for E-Commerce Search
Figure 4 for Learning Multi-Stage Multi-Grained Semantic Embeddings for E-Commerce Search
Viaarxiv icon

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Add code
Jan 08, 2022
Figure 1 for A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval
Figure 2 for A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval
Figure 3 for A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval
Figure 4 for A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval
Viaarxiv icon