Picture for Zhihong Zhu

Zhihong Zhu

Enhancing Image Generation Fidelity via Progressive Prompts

Add code
Jan 13, 2025
Viaarxiv icon

VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification

Add code
Jan 11, 2025
Viaarxiv icon

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Add code
Dec 13, 2024
Viaarxiv icon

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

Add code
Sep 16, 2024
Viaarxiv icon

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation

Add code
Sep 14, 2024
Viaarxiv icon

Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective

Add code
Sep 11, 2024
Figure 1 for Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Figure 2 for Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Figure 3 for Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Figure 4 for Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Viaarxiv icon

XMeCap: Meme Caption Generation with Sub-Image Adaptability

Add code
Jul 24, 2024
Figure 1 for XMeCap: Meme Caption Generation with Sub-Image Adaptability
Figure 2 for XMeCap: Meme Caption Generation with Sub-Image Adaptability
Figure 3 for XMeCap: Meme Caption Generation with Sub-Image Adaptability
Figure 4 for XMeCap: Meme Caption Generation with Sub-Image Adaptability
Viaarxiv icon

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Add code
Jul 05, 2024
Figure 1 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 2 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 3 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Figure 4 for MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Viaarxiv icon

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

Add code
Jun 26, 2024
Figure 1 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 2 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 3 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Figure 4 for LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Viaarxiv icon

D2O:Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

Add code
Jun 18, 2024
Viaarxiv icon