Picture for Bo He

Bo He

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning

Add code
Jan 23, 2025
Viaarxiv icon

Evaluating the Design Features of an Intelligent Tutoring System for Advanced Mathematics Learning

Add code
Dec 23, 2024
Viaarxiv icon

UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation

Add code
Jul 25, 2024
Figure 1 for UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation
Figure 2 for UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation
Figure 3 for UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation
Figure 4 for UMono: Physical Model Informed Hybrid CNN-Transformer Framework for Underwater Monocular Depth Estimation
Viaarxiv icon

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Add code
Apr 08, 2024
Viaarxiv icon

OmniVid: A Generative Framework for Universal Video Understanding

Add code
Mar 26, 2024
Viaarxiv icon

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Add code
Nov 29, 2023
Viaarxiv icon

Chop & Learn: Recognizing and Generating Object-State Compositions

Add code
Sep 25, 2023
Viaarxiv icon

Towards Scalable Neural Representation for Diverse Videos

Add code
Mar 24, 2023
Viaarxiv icon

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

Add code
Mar 13, 2023
Viaarxiv icon

CNeRV: Content-adaptive Neural Representation for Visual Data

Add code
Nov 18, 2022
Viaarxiv icon