Picture for Jing Shi

Jing Shi

GroundingBooth: Grounding Text-to-Image Customization

Add code
Sep 13, 2024
Viaarxiv icon

Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images

Add code
Aug 24, 2024
Viaarxiv icon

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Add code
Jun 11, 2024
Viaarxiv icon

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Add code
Apr 23, 2024
Figure 1 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 2 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 3 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 4 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Viaarxiv icon

VIXEN: Visual Text Comparison Network for Image Difference Captioning

Add code
Mar 14, 2024
Viaarxiv icon

Text-to-Audio Generation Synchronized with Videos

Add code
Mar 08, 2024
Viaarxiv icon

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Add code
Feb 22, 2024
Viaarxiv icon

A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

Add code
Jul 30, 2023
Viaarxiv icon

ViLaS: Integrating Vision and Language into Automatic Speech Recognition

Add code
May 31, 2023
Figure 1 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 2 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 3 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 4 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Viaarxiv icon

DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment

Add code
May 22, 2023
Figure 1 for DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Figure 2 for DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Figure 3 for DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Viaarxiv icon