Picture for Yi Zhu

Yi Zhu

Benchmarking Table Comprehension In The Wild

Add code
Dec 13, 2024
Viaarxiv icon

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

Add code
Dec 03, 2024
Viaarxiv icon

VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation

Add code
Nov 14, 2024
Viaarxiv icon

Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge

Add code
Oct 09, 2024
Figure 1 for Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Figure 2 for Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Figure 3 for Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Figure 4 for Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Viaarxiv icon

Differential Transformer

Add code
Oct 07, 2024
Figure 1 for Differential Transformer
Figure 2 for Differential Transformer
Figure 3 for Differential Transformer
Figure 4 for Differential Transformer
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

Cross-Organ Domain Adaptive Neural Network for Pancreatic Endoscopic Ultrasound Image Segmentation

Add code
Sep 07, 2024
Viaarxiv icon

UNIT: Unifying Image and Text Recognition in One Vision Encoder

Add code
Sep 06, 2024
Figure 1 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 2 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 3 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Figure 4 for UNIT: Unifying Image and Text Recognition in One Vision Encoder
Viaarxiv icon

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

Add code
Jul 26, 2024
Viaarxiv icon

WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model

Add code
Jun 26, 2024
Viaarxiv icon