Picture for Yuxuan Wang

Yuxuan Wang

Sherman

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context

Add code
Mar 19, 2025
Viaarxiv icon

A Parallel Hybrid Action Space Reinforcement Learning Model for Real-world Adaptive Traffic Signal Control

Add code
Mar 18, 2025
Viaarxiv icon

PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture

Add code
Mar 14, 2025
Viaarxiv icon

NsBM-GAT: A Non-stationary Block Maximum and Graph Attention Framework for General Traffic Crash Risk Prediction

Add code
Mar 06, 2025
Viaarxiv icon

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Add code
Feb 26, 2025
Viaarxiv icon

The establishment of static digital humans and the integration with spinal models

Add code
Feb 11, 2025
Viaarxiv icon

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Add code
Feb 06, 2025
Viaarxiv icon

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Add code
Jan 27, 2025
Figure 1 for Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Figure 2 for Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Figure 3 for Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Figure 4 for Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Viaarxiv icon

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Add code
Jan 10, 2025
Figure 1 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 2 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 3 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 4 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Viaarxiv icon

LongViTU: Instruction Tuning for Long-Form Video Understanding

Add code
Jan 09, 2025
Viaarxiv icon