Picture for Boqing Gong

Boqing Gong

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning

Add code
Apr 13, 2025
Viaarxiv icon

VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro

Add code
Apr 12, 2025
Viaarxiv icon

DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments

Add code
Dec 28, 2024
Figure 1 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 2 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 3 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 4 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Viaarxiv icon

Neptune: The Long Orbit to Benchmarking Long Video Understanding

Add code
Dec 12, 2024
Figure 1 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 2 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 3 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 4 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Viaarxiv icon

Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space

Add code
Nov 27, 2024
Figure 1 for Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space
Figure 2 for Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space
Figure 3 for Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space
Figure 4 for Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space
Viaarxiv icon

Extending Video Masked Autoencoders to 128 frames

Add code
Nov 20, 2024
Figure 1 for Extending Video Masked Autoencoders to 128 frames
Figure 2 for Extending Video Masked Autoencoders to 128 frames
Figure 3 for Extending Video Masked Autoencoders to 128 frames
Figure 4 for Extending Video Masked Autoencoders to 128 frames
Viaarxiv icon

OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities

Add code
Oct 16, 2024
Viaarxiv icon

$ε$-VAE: Denoising as Visual Decoding

Add code
Oct 05, 2024
Viaarxiv icon

SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining

Add code
Sep 26, 2024
Figure 1 for SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Figure 2 for SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Figure 3 for SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Figure 4 for SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Viaarxiv icon

On Discrete Prompt Optimization for Diffusion Models

Add code
Jun 27, 2024
Viaarxiv icon