Picture for Yuzhang Shang

Yuzhang Shang

Real-Time Robot Execution with Masked Action Chunking

Add code
Jan 27, 2026
Viaarxiv icon

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models

Add code
Jan 20, 2026
Viaarxiv icon

Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

Add code
Jan 15, 2026
Viaarxiv icon

PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache

Add code
Jan 07, 2026
Viaarxiv icon

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Add code
Dec 19, 2025
Figure 1 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 2 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 3 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Figure 4 for AdaTooler-V: Adaptive Tool-Use for Images and Videos
Viaarxiv icon

Distill Video Datasets into Images

Add code
Dec 16, 2025
Figure 1 for Distill Video Datasets into Images
Figure 2 for Distill Video Datasets into Images
Figure 3 for Distill Video Datasets into Images
Figure 4 for Distill Video Datasets into Images
Viaarxiv icon

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Add code
Nov 17, 2025
Figure 1 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 2 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 3 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 4 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Viaarxiv icon

Efficient Multimodal Dataset Distillation via Generative Models

Add code
Sep 18, 2025
Viaarxiv icon

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

Add code
Sep 11, 2025
Viaarxiv icon

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

Add code
Jul 27, 2025
Figure 1 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 2 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 3 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Figure 4 for When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Viaarxiv icon