Picture for Haopeng Li

Haopeng Li

Admitting Ignorance Helps the Video Question Answering Models to Answer

Add code
Jan 15, 2025
Viaarxiv icon

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

Add code
Sep 29, 2024
Figure 1 for Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
Figure 2 for Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
Figure 3 for Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
Figure 4 for Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
Viaarxiv icon

Scalable Autoregressive Image Generation with Mamba

Add code
Aug 22, 2024
Viaarxiv icon

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Add code
Jun 05, 2024
Viaarxiv icon

Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

Add code
Jan 07, 2024
Viaarxiv icon

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

Add code
Jan 03, 2024
Viaarxiv icon

Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation

Add code
Jul 30, 2022
Figure 1 for Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation
Figure 2 for Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation
Figure 3 for Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation
Figure 4 for Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation
Viaarxiv icon

Video Crowd Localization with Multi-focus Gaussian Neighbor Attention and a Large-Scale Benchmark

Add code
Jul 20, 2021
Figure 1 for Video Crowd Localization with Multi-focus Gaussian Neighbor Attention and a Large-Scale Benchmark
Figure 2 for Video Crowd Localization with Multi-focus Gaussian Neighbor Attention and a Large-Scale Benchmark
Figure 3 for Video Crowd Localization with Multi-focus Gaussian Neighbor Attention and a Large-Scale Benchmark
Figure 4 for Video Crowd Localization with Multi-focus Gaussian Neighbor Attention and a Large-Scale Benchmark
Viaarxiv icon

Reconstructive Sequence-Graph Network for Video Summarization

Add code
May 10, 2021
Figure 1 for Reconstructive Sequence-Graph Network for Video Summarization
Figure 2 for Reconstructive Sequence-Graph Network for Video Summarization
Figure 3 for Reconstructive Sequence-Graph Network for Video Summarization
Figure 4 for Reconstructive Sequence-Graph Network for Video Summarization
Viaarxiv icon

Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network

Add code
Feb 04, 2019
Figure 1 for Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network
Figure 2 for Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network
Figure 3 for Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network
Figure 4 for Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network
Viaarxiv icon