Picture for Zitong Yu

Zitong Yu

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Add code
Mar 12, 2026
Viaarxiv icon

$Δ$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

Add code
Mar 09, 2026
Viaarxiv icon

AULLM++: Structural Reasoning with Large Language Models for Micro-Expression Recognition

Add code
Mar 09, 2026
Viaarxiv icon

Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning

Add code
Feb 03, 2026
Viaarxiv icon

High-Resolution Underwater Camouflaged Object Detection: GBU-UCOD Dataset and Topology-Aware and Frequency-Decoupled Networks

Add code
Feb 03, 2026
Viaarxiv icon

CORE:Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of Large Language Model Agents Over Hierarchical Edge

Add code
Jan 29, 2026
Viaarxiv icon

Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing

Add code
Nov 18, 2025
Figure 1 for Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing
Figure 2 for Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing
Figure 3 for Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing
Figure 4 for Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing
Viaarxiv icon

SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition

Add code
Nov 13, 2025
Viaarxiv icon

When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?

Add code
Nov 13, 2025
Viaarxiv icon

SPASHT: An image-enhancement method for sparse-view MPI SPECT

Add code
Nov 09, 2025
Viaarxiv icon