Picture for Boqing Gong

Boqing Gong

OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities

Add code
Oct 16, 2024
Viaarxiv icon

$ε$-VAE: Denoising as Visual Decoding

Add code
Oct 05, 2024
Viaarxiv icon

SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining

Add code
Sep 26, 2024
Viaarxiv icon

On Discrete Prompt Optimization for Diffusion Models

Add code
Jun 27, 2024
Viaarxiv icon

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

Add code
Jun 05, 2024
Viaarxiv icon

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

Add code
Jun 04, 2024
Viaarxiv icon

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Add code
May 28, 2024
Viaarxiv icon

Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

Add code
May 20, 2024
Figure 1 for Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning
Figure 2 for Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning
Figure 3 for Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning
Figure 4 for Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Distilling Vision-Language Models on Millions of Videos

Add code
Jan 11, 2024
Viaarxiv icon