Picture for Yi Zhu

Yi Zhu

VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation

Add code
Nov 14, 2024
Viaarxiv icon

Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge

Add code
Oct 09, 2024
Figure 1 for Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Figure 2 for Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Figure 3 for Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Figure 4 for Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Viaarxiv icon

Differential Transformer

Add code
Oct 07, 2024
Figure 1 for Differential Transformer
Figure 2 for Differential Transformer
Figure 3 for Differential Transformer
Figure 4 for Differential Transformer
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

Cross-Organ Domain Adaptive Neural Network for Pancreatic Endoscopic Ultrasound Image Segmentation

Add code
Sep 07, 2024
Viaarxiv icon

UNIT: Unifying Image and Text Recognition in One Vision Encoder

Add code
Sep 06, 2024
Figure 1 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 2 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 3 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 4 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Viaarxiv icon

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

Add code
Jul 26, 2024
Viaarxiv icon

WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model

Add code
Jun 26, 2024
Viaarxiv icon

A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

Add code
Jun 17, 2024
Viaarxiv icon

UrBAN: Urban Beehive Acoustics and PheNotyping Dataset

Add code
Jun 05, 2024
Viaarxiv icon