Picture for Jun Song

Jun Song

GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery

Add code
Feb 15, 2026
Viaarxiv icon

Text Before Vision: Staged Knowledge Injection Matters for Agentic RLVR in Ultra-High-Resolution Remote Sensing Understanding

Add code
Feb 15, 2026
Viaarxiv icon

Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning

Add code
Feb 02, 2026
Viaarxiv icon

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Add code
Jan 21, 2026
Viaarxiv icon

Unified Thinker: A General Reasoning Modular Core for Image Generation

Add code
Jan 06, 2026
Viaarxiv icon

CaveAgent: Transforming LLMs into Stateful Runtime Operators

Add code
Jan 04, 2026
Viaarxiv icon

AndroidLens: Long-latency Evaluation with Nested Sub-targets for Android GUI Agents

Add code
Dec 24, 2025
Viaarxiv icon

Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS

Add code
Nov 13, 2025
Figure 1 for Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS
Figure 2 for Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS
Figure 3 for Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS
Figure 4 for Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTS
Viaarxiv icon

MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs

Add code
Aug 28, 2025
Figure 1 for MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
Figure 2 for MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
Figure 3 for MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
Figure 4 for MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
Viaarxiv icon

InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning

Add code
Aug 27, 2025
Figure 1 for InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning
Figure 2 for InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning
Figure 3 for InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning
Figure 4 for InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning
Viaarxiv icon