Picture for Wentao Bao

Wentao Bao

Visual Large Language Models for Generalized and Specialized Applications

Add code
Jan 06, 2025
Figure 1 for Visual Large Language Models for Generalized and Specialized Applications
Figure 2 for Visual Large Language Models for Generalized and Specialized Applications
Figure 3 for Visual Large Language Models for Generalized and Specialized Applications
Figure 4 for Visual Large Language Models for Generalized and Specialized Applications
Viaarxiv icon

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

Add code
Nov 17, 2024
Figure 1 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 2 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 3 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 4 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Viaarxiv icon

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Add code
Sep 22, 2024
Figure 1 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 2 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 3 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 4 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Viaarxiv icon

MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos

Add code
Sep 04, 2024
Figure 1 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 2 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 3 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 4 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Viaarxiv icon

Facial Affective Behavior Analysis with Instruction Tuning

Add code
Apr 07, 2024
Viaarxiv icon

Latent Space Energy-based Model for Fine-grained Open Set Recognition

Add code
Sep 19, 2023
Viaarxiv icon

On Model Explanations with Transferable Neural Pathways

Add code
Sep 18, 2023
Viaarxiv icon

Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

Add code
Jul 17, 2023
Figure 1 for Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Figure 2 for Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Figure 3 for Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Figure 4 for Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Viaarxiv icon

Prompting Language-Informed Distribution for Compositional Zero-Shot Learning

Add code
May 23, 2023
Viaarxiv icon

Towards Open Set Video Anomaly Detection

Add code
Aug 23, 2022
Figure 1 for Towards Open Set Video Anomaly Detection
Figure 2 for Towards Open Set Video Anomaly Detection
Figure 3 for Towards Open Set Video Anomaly Detection
Figure 4 for Towards Open Set Video Anomaly Detection
Viaarxiv icon