Picture for Jindong Gu

Jindong Gu

FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Add code
Jan 11, 2025
Viaarxiv icon

Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs

Add code
Jan 11, 2025
Viaarxiv icon

SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation

Add code
Dec 13, 2024
Viaarxiv icon

Uncovering Vision Modality Threats in Image-to-Image Tasks

Add code
Dec 07, 2024
Viaarxiv icon

Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models

Add code
Dec 06, 2024
Viaarxiv icon

UVCG: Leveraging Temporal Consistency for Universal Video Protection

Add code
Nov 25, 2024
Viaarxiv icon

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Add code
Nov 22, 2024
Figure 1 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 2 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 3 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 4 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Viaarxiv icon

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

Add code
Oct 07, 2024
Figure 1 for FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Figure 2 for FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Figure 3 for FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Figure 4 for FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Viaarxiv icon

Visual Question Decomposition on Multimodal Large Language Models

Add code
Sep 28, 2024
Figure 1 for Visual Question Decomposition on Multimodal Large Language Models
Figure 2 for Visual Question Decomposition on Multimodal Large Language Models
Figure 3 for Visual Question Decomposition on Multimodal Large Language Models
Figure 4 for Visual Question Decomposition on Multimodal Large Language Models
Viaarxiv icon

Multimodal Pragmatic Jailbreak on Text-to-image Models

Add code
Sep 27, 2024
Viaarxiv icon