Picture for Xinyuan Zhang

Xinyuan Zhang

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

Add code
Jan 30, 2026
Viaarxiv icon

Unified Multimodal and Multilingual Retrieval via Multi-Task Learning with NLU Integration

Add code
Jan 21, 2026
Viaarxiv icon

SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning

Add code
Oct 02, 2025
Figure 1 for SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
Figure 2 for SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
Figure 3 for SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
Figure 4 for SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
Viaarxiv icon

Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage

Add code
Oct 02, 2025
Figure 1 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 2 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 3 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 4 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Viaarxiv icon

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Add code
Aug 08, 2025
Viaarxiv icon

LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs

Add code
May 06, 2025
Figure 1 for LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs
Figure 2 for LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs
Figure 3 for LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs
Figure 4 for LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs
Viaarxiv icon

AttFC: Attention Fully-Connected Layer for Large-Scale Face Recognition with One GPU

Add code
Mar 10, 2025
Viaarxiv icon

MSConv: Multiplicative and Subtractive Convolution for Face Recognition

Add code
Mar 08, 2025
Viaarxiv icon

RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition

Add code
Mar 05, 2025
Figure 1 for RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition
Figure 2 for RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition
Figure 3 for RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition
Figure 4 for RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition
Viaarxiv icon

Deep Learning-Based Diffusion MRI Tractography: Integrating Spatial and Anatomical Information

Add code
Mar 05, 2025
Viaarxiv icon