Picture for Gengyuan Zhang

Gengyuan Zhang

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

Add code
Oct 07, 2024
Viaarxiv icon

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

Add code
Sep 30, 2024
Figure 1 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 2 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 3 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 4 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Viaarxiv icon

Multimodal Pragmatic Jailbreak on Text-to-image Models

Add code
Sep 27, 2024
Viaarxiv icon

Localizing Events in Videos with Multimodal Queries

Add code
Jun 14, 2024
Viaarxiv icon

SPOT! Revisiting Video-Language Models for Event Understanding

Add code
Dec 01, 2023
Viaarxiv icon

Multi-event Video-Text Retrieval

Add code
Aug 22, 2023
Viaarxiv icon

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

Add code
Jul 24, 2023
Viaarxiv icon

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Add code
Jul 12, 2023
Viaarxiv icon

CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering

Add code
Nov 19, 2022
Viaarxiv icon