Picture for Gengyuan Zhang

Gengyuan Zhang

Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs

Add code
Feb 21, 2025
Viaarxiv icon

Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries

Add code
Dec 26, 2024
Viaarxiv icon

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

Add code
Oct 07, 2024
Figure 1 for FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Figure 2 for FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Figure 3 for FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Figure 4 for FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Viaarxiv icon

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

Add code
Sep 30, 2024
Figure 1 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 2 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 3 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 4 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Viaarxiv icon

Multimodal Pragmatic Jailbreak on Text-to-image Models

Add code
Sep 27, 2024
Viaarxiv icon

Localizing Events in Videos with Multimodal Queries

Add code
Jun 14, 2024
Figure 1 for Localizing Events in Videos with Multimodal Queries
Figure 2 for Localizing Events in Videos with Multimodal Queries
Figure 3 for Localizing Events in Videos with Multimodal Queries
Figure 4 for Localizing Events in Videos with Multimodal Queries
Viaarxiv icon

SPOT! Revisiting Video-Language Models for Event Understanding

Add code
Dec 01, 2023
Viaarxiv icon

Multi-event Video-Text Retrieval

Add code
Aug 22, 2023
Viaarxiv icon

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

Add code
Jul 24, 2023
Viaarxiv icon

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Add code
Jul 12, 2023
Figure 1 for Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Figure 2 for Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Figure 3 for Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Figure 4 for Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
Viaarxiv icon