Picture for Boqing Gong

Boqing Gong

Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models

Add code
Jun 12, 2025
Viaarxiv icon

Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck

Add code
May 30, 2025
Viaarxiv icon

SITE: towards Spatial Intelligence Thorough Evaluation

Add code
May 08, 2025
Viaarxiv icon

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning

Add code
Apr 13, 2025
Viaarxiv icon

VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro

Add code
Apr 12, 2025
Viaarxiv icon

DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments

Add code
Dec 28, 2024
Figure 1 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 2 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 3 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Figure 4 for DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments
Viaarxiv icon

Neptune: The Long Orbit to Benchmarking Long Video Understanding

Add code
Dec 12, 2024
Figure 1 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 2 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 3 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Figure 4 for Neptune: The Long Orbit to Benchmarking Long Video Understanding
Viaarxiv icon

Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space

Add code
Nov 27, 2024
Figure 1 for Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space
Figure 2 for Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space
Figure 3 for Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space
Figure 4 for Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space
Viaarxiv icon

Extending Video Masked Autoencoders to 128 frames

Add code
Nov 20, 2024
Figure 1 for Extending Video Masked Autoencoders to 128 frames
Figure 2 for Extending Video Masked Autoencoders to 128 frames
Figure 3 for Extending Video Masked Autoencoders to 128 frames
Figure 4 for Extending Video Masked Autoencoders to 128 frames
Viaarxiv icon

OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities

Add code
Oct 16, 2024
Viaarxiv icon