Picture for Thao Minh Le

Thao Minh Le

Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models

Add code
Dec 11, 2024
Viaarxiv icon

Unified Framework with Consistency across Modalities for Human Activity Recognition

Add code
Sep 04, 2024
Viaarxiv icon

SADL: An Effective In-Context Learning Method for Compositional Visual QA

Add code
Jul 02, 2024
Viaarxiv icon

Deep Neural Networks for Visual Reasoning

Add code
Sep 24, 2022
Figure 1 for Deep Neural Networks for Visual Reasoning
Figure 2 for Deep Neural Networks for Visual Reasoning
Figure 3 for Deep Neural Networks for Visual Reasoning
Figure 4 for Deep Neural Networks for Visual Reasoning
Viaarxiv icon

Video Dialog as Conversation about Objects Living in Space-Time

Add code
Jul 08, 2022
Figure 1 for Video Dialog as Conversation about Objects Living in Space-Time
Figure 2 for Video Dialog as Conversation about Objects Living in Space-Time
Figure 3 for Video Dialog as Conversation about Objects Living in Space-Time
Figure 4 for Video Dialog as Conversation about Objects Living in Space-Time
Viaarxiv icon

Guiding Visual Question Answering with Attention Priors

Add code
May 25, 2022
Figure 1 for Guiding Visual Question Answering with Attention Priors
Figure 2 for Guiding Visual Question Answering with Attention Priors
Figure 3 for Guiding Visual Question Answering with Attention Priors
Figure 4 for Guiding Visual Question Answering with Attention Priors
Viaarxiv icon

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

Add code
Jun 25, 2021
Figure 1 for Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
Figure 2 for Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
Figure 3 for Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
Figure 4 for Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering
Viaarxiv icon

Object-Centric Representation Learning for Video Question Answering

Add code
Apr 13, 2021
Figure 1 for Object-Centric Representation Learning for Video Question Answering
Figure 2 for Object-Centric Representation Learning for Video Question Answering
Figure 3 for Object-Centric Representation Learning for Video Question Answering
Figure 4 for Object-Centric Representation Learning for Video Question Answering
Viaarxiv icon

Hierarchical Conditional Relation Networks for Multimodal Video Question Answering

Add code
Oct 18, 2020
Figure 1 for Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
Figure 2 for Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
Figure 3 for Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
Figure 4 for Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
Viaarxiv icon

GEFA: Early Fusion Approach in Drug-Target Affinity Prediction

Add code
Sep 28, 2020
Figure 1 for GEFA: Early Fusion Approach in Drug-Target Affinity Prediction
Figure 2 for GEFA: Early Fusion Approach in Drug-Target Affinity Prediction
Figure 3 for GEFA: Early Fusion Approach in Drug-Target Affinity Prediction
Figure 4 for GEFA: Early Fusion Approach in Drug-Target Affinity Prediction
Viaarxiv icon