Picture for Yuxuan Wang

Yuxuan Wang

Sherman

OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training

Add code
Mar 31, 2025
Viaarxiv icon

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Add code
Mar 29, 2025
Viaarxiv icon

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions

Add code
Mar 26, 2025
Viaarxiv icon

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context

Add code
Mar 19, 2025
Viaarxiv icon

A Parallel Hybrid Action Space Reinforcement Learning Model for Real-world Adaptive Traffic Signal Control

Add code
Mar 18, 2025
Viaarxiv icon

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Add code
Mar 14, 2025
Viaarxiv icon

NsBM-GAT: A Non-stationary Block Maximum and Graph Attention Framework for General Traffic Crash Risk Prediction

Add code
Mar 06, 2025
Viaarxiv icon

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Add code
Feb 26, 2025
Viaarxiv icon

The establishment of static digital humans and the integration with spinal models

Add code
Feb 11, 2025
Viaarxiv icon

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Add code
Feb 06, 2025
Viaarxiv icon