Picture for Xu-Cheng Yin

Xu-Cheng Yin

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles

Add code
Jan 02, 2025
Viaarxiv icon

Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition

Add code
Jan 01, 2025
Figure 1 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 2 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 3 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 4 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Viaarxiv icon

I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception

Add code
Nov 20, 2024
Viaarxiv icon

HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection

Add code
Sep 24, 2024
Figure 1 for HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection
Figure 2 for HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection
Figure 3 for HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection
Figure 4 for HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection
Viaarxiv icon

HQOD: Harmonious Quantization for Object Detection

Add code
Aug 05, 2024
Viaarxiv icon

Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer

Add code
Jul 16, 2024
Figure 1 for Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer
Figure 2 for Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer
Figure 3 for Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer
Figure 4 for Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer
Viaarxiv icon

Arbitrary Time Information Modeling via Polynomial Approximation for Temporal Knowledge Graph Embedding

Add code
May 01, 2024
Viaarxiv icon

Transformer-based Reasoning for Learning Evolutionary Chain of Events on Temporal Knowledge Graph

Add code
May 01, 2024
Viaarxiv icon

Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling

Add code
Jan 08, 2024
Viaarxiv icon

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Add code
May 23, 2023
Viaarxiv icon