Picture for Yanqing Liu

Yanqing Liu

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Add code
Jan 21, 2026
Viaarxiv icon

A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation

Add code
Jan 18, 2026
Viaarxiv icon

Next Tokens Denoising for Speech Synthesis

Add code
Jul 30, 2025
Viaarxiv icon

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling

Add code
Jun 14, 2025
Viaarxiv icon

Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling

Add code
May 26, 2025
Viaarxiv icon

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Add code
May 07, 2025
Figure 1 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 2 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 3 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Figure 4 for OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Figure 1 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 2 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 3 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 4 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Viaarxiv icon

Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners

Add code
Dec 06, 2024
Figure 1 for Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Figure 2 for Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Figure 3 for Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Figure 4 for Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Viaarxiv icon

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

Add code
Nov 25, 2024
Figure 1 for CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
Figure 2 for CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
Figure 3 for CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
Figure 4 for CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
Viaarxiv icon

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Add code
Nov 11, 2024
Figure 1 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 2 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 3 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 4 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Viaarxiv icon