Picture for Aoxiong Yin

Aoxiong Yin

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

Add code
Mar 06, 2025
Viaarxiv icon

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

Add code
Jun 11, 2024
Viaarxiv icon

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

Add code
Dec 23, 2023
Viaarxiv icon

Language Model is a Branch Predictor for Simultaneous Machine Translation

Add code
Dec 22, 2023
Viaarxiv icon

TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System

Add code
Nov 23, 2023
Viaarxiv icon

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding

Add code
Jul 25, 2023
Viaarxiv icon

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

Add code
Jul 18, 2023
Viaarxiv icon

Gloss Attention for Gloss-free Sign Language Translation

Add code
Jul 14, 2023
Viaarxiv icon

Connecting Multi-modal Contrastive Representations

Add code
May 22, 2023
Viaarxiv icon

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

Add code
Mar 09, 2023
Viaarxiv icon