Picture for Bohan Li

Bohan Li

dots.tts Technical Report

Add code
Jun 05, 2026
Viaarxiv icon

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

Add code
Jun 05, 2026
Viaarxiv icon

Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

Add code
Jun 03, 2026
Viaarxiv icon

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Add code
May 28, 2026
Viaarxiv icon

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

Add code
Apr 28, 2026
Viaarxiv icon

MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis

Add code
Apr 13, 2026
Viaarxiv icon

PAM: A Pose-Appearance-Motion Engine for Sim-to-Real HOI Video Generation

Add code
Mar 23, 2026
Viaarxiv icon

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

Add code
Feb 15, 2026
Viaarxiv icon

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Add code
Dec 29, 2025
Viaarxiv icon

BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction

Add code
Nov 08, 2025
Figure 1 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 2 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 3 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Figure 4 for BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
Viaarxiv icon