Picture for Zhihong Zhu

Zhihong Zhu

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Add code
Dec 13, 2024
Viaarxiv icon

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

Add code
Sep 16, 2024
Viaarxiv icon

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation

Add code
Sep 14, 2024
Viaarxiv icon

Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective

Add code
Sep 11, 2024
Figure 1 for Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Figure 2 for Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Figure 3 for Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Figure 4 for Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Viaarxiv icon

XMeCap: Meme Caption Generation with Sub-Image Adaptability

Add code
Jul 24, 2024
Viaarxiv icon

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Add code
Jul 05, 2024
Figure 1 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 2 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 3 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 4 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Viaarxiv icon

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

Add code
Jun 26, 2024
Figure 1 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 2 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 3 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 4 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Viaarxiv icon

D2O:Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

Add code
Jun 18, 2024
Viaarxiv icon

Textual Inversion and Self-supervised Refinement for Radiology Report Generation

Add code
May 31, 2024
Viaarxiv icon

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

Add code
May 31, 2024
Viaarxiv icon