Picture for Yunyang Xiong

Yunyang Xiong

Small Vision-Language Models are Smart Compressors for Long Video Understanding

Add code
Apr 09, 2026
Viaarxiv icon

Neural Computers

Add code
Apr 07, 2026
Viaarxiv icon

Efficient Universal Perception Encoder

Add code
Mar 23, 2026
Viaarxiv icon

EgoAVU: Egocentric Audio-Visual Understanding

Add code
Feb 05, 2026
Viaarxiv icon

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Add code
Jan 08, 2026
Viaarxiv icon

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Add code
Feb 04, 2025
Viaarxiv icon

EdgeTAM: On-Device Track Anything Model

Add code
Jan 13, 2025
Viaarxiv icon

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Add code
Dec 18, 2024
Figure 1 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 2 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 3 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Figure 4 for MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Viaarxiv icon

Efficient Track Anything

Add code
Nov 28, 2024
Viaarxiv icon

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Add code
Oct 22, 2024
Figure 1 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 2 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 3 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Figure 4 for LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Viaarxiv icon