Picture for Jianguo Cao

Jianguo Cao

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Add code
Jun 14, 2024
Viaarxiv icon

A Survey on Multimodal Large Language Models for Autonomous Driving

Add code
Nov 21, 2023
Figure 1 for A Survey on Multimodal Large Language Models for Autonomous Driving
Figure 2 for A Survey on Multimodal Large Language Models for Autonomous Driving
Figure 3 for A Survey on Multimodal Large Language Models for Autonomous Driving
Figure 4 for A Survey on Multimodal Large Language Models for Autonomous Driving
Viaarxiv icon

ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial Diagnosis

Add code
Oct 30, 2022
Viaarxiv icon

AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation

Add code
May 11, 2022
Figure 1 for AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
Figure 2 for AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
Figure 3 for AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
Figure 4 for AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation
Viaarxiv icon