Picture for Joemon M. Jose

Joemon M. Jose

CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation

Add code
Apr 14, 2025
Viaarxiv icon

Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis

Add code
Apr 14, 2025
Viaarxiv icon

Enhancing Interpretability in Generative AI Through Search-Based Data Influence Analysis

Add code
Apr 02, 2025
Viaarxiv icon

LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation

Add code
Feb 19, 2025
Viaarxiv icon

Large Language Model driven Policy Exploration for Recommender Systems

Add code
Jan 23, 2025
Figure 1 for Large Language Model driven Policy Exploration for Recommender Systems
Figure 2 for Large Language Model driven Policy Exploration for Recommender Systems
Figure 3 for Large Language Model driven Policy Exploration for Recommender Systems
Figure 4 for Large Language Model driven Policy Exploration for Recommender Systems
Viaarxiv icon

Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation

Add code
Nov 05, 2024
Figure 1 for Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Figure 2 for Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Figure 3 for Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Figure 4 for Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Viaarxiv icon

R^3AG: First Workshop on Refined and Reliable Retrieval Augmented Generation

Add code
Oct 27, 2024
Viaarxiv icon

Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

Add code
Aug 01, 2024
Figure 1 for Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Figure 2 for Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Figure 3 for Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Figure 4 for Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Viaarxiv icon

Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

Add code
May 26, 2024
Figure 1 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Figure 2 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Figure 3 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Figure 4 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Viaarxiv icon

3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

Add code
Apr 26, 2024
Figure 1 for 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Figure 2 for 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Figure 3 for 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Figure 4 for 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Viaarxiv icon