Picture for Bolin Lai

Bolin Lai

ARGaze: Autoregressive Transformers for Online Egocentric Gaze Estimation

Add code
Feb 04, 2026
Viaarxiv icon

Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training

Add code
May 27, 2025
Figure 1 for Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Figure 2 for Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Figure 3 for Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Figure 4 for Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Viaarxiv icon

SocialGesture: Delving into Multi-person Gesture Understanding

Add code
Apr 03, 2025
Viaarxiv icon

Learning Predictive Visuomotor Coordination

Add code
Mar 30, 2025
Viaarxiv icon

Towards Online Multi-Modal Social Interaction Understanding

Add code
Mar 25, 2025
Figure 1 for Towards Online Multi-Modal Social Interaction Understanding
Figure 2 for Towards Online Multi-Modal Social Interaction Understanding
Figure 3 for Towards Online Multi-Modal Social Interaction Understanding
Figure 4 for Towards Online Multi-Modal Social Interaction Understanding
Viaarxiv icon

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Add code
Jan 08, 2025
Figure 1 for Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Figure 2 for Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Figure 3 for Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Figure 4 for Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Viaarxiv icon

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

Add code
Dec 03, 2024
Figure 1 for Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Figure 2 for Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Figure 3 for Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Figure 4 for Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Viaarxiv icon

Human Action Anticipation: A Survey

Add code
Oct 17, 2024
Figure 1 for Human Action Anticipation: A Survey
Figure 2 for Human Action Anticipation: A Survey
Figure 3 for Human Action Anticipation: A Survey
Figure 4 for Human Action Anticipation: A Survey
Viaarxiv icon

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
Figure 2 for MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
Figure 3 for MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
Figure 4 for MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
Viaarxiv icon

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Add code
Jun 14, 2024
Viaarxiv icon