Picture for Junbo Zhang

Junbo Zhang

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

Add code
Mar 25, 2026
Viaarxiv icon

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

Add code
Mar 24, 2026
Viaarxiv icon

DashengTokenizer: One layer is enough for unified audio understanding and generation

Add code
Feb 27, 2026
Viaarxiv icon

Scaling World Model for Hierarchical Manipulation Policies

Add code
Feb 12, 2026
Viaarxiv icon

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Add code
Dec 31, 2025
Viaarxiv icon

JoyAgent-JDGenie: Technical Report on the GAIA

Add code
Oct 01, 2025
Viaarxiv icon

MiDashengLM: Efficient Audio Understanding with General Audio Captions

Add code
Aug 06, 2025
Figure 1 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 2 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 3 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 4 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Viaarxiv icon

Unified Vision-Language-Action Model

Add code
Jun 24, 2025
Figure 1 for Unified Vision-Language-Action Model
Figure 2 for Unified Vision-Language-Action Model
Figure 3 for Unified Vision-Language-Action Model
Figure 4 for Unified Vision-Language-Action Model
Viaarxiv icon

Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders

Add code
Jun 13, 2025
Viaarxiv icon

GLAP: General contrastive audio-text pretraining across domains and languages

Add code
Jun 12, 2025
Figure 1 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 2 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 3 for GLAP: General contrastive audio-text pretraining across domains and languages
Figure 4 for GLAP: General contrastive audio-text pretraining across domains and languages
Viaarxiv icon