Picture for Xuri Ge

Xuri Ge

LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation

Add code
Feb 19, 2025
Viaarxiv icon

Multimodal Sentiment Analysis Based on Causal Reasoning

Add code
Dec 10, 2024
Figure 1 for Multimodal Sentiment Analysis Based on Causal Reasoning
Figure 2 for Multimodal Sentiment Analysis Based on Causal Reasoning
Figure 3 for Multimodal Sentiment Analysis Based on Causal Reasoning
Figure 4 for Multimodal Sentiment Analysis Based on Causal Reasoning
Viaarxiv icon

Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation

Add code
Nov 05, 2024
Figure 1 for Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Figure 2 for Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Figure 3 for Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Figure 4 for Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
Viaarxiv icon

R^3AG: First Workshop on Refined and Reliable Retrieval Augmented Generation

Add code
Oct 27, 2024
Viaarxiv icon

HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems

Add code
Oct 11, 2024
Figure 1 for HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems
Figure 2 for HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems
Figure 3 for HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems
Figure 4 for HpEIS: Learning Hand Pose Embeddings for Multimedia Interactive Systems
Viaarxiv icon

Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

Add code
Aug 01, 2024
Figure 1 for Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Figure 2 for Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Figure 3 for Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Figure 4 for Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Viaarxiv icon

Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

Add code
May 26, 2024
Figure 1 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Figure 2 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Figure 3 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Figure 4 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition
Viaarxiv icon

3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

Add code
Apr 26, 2024
Figure 1 for 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Figure 2 for 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Figure 3 for 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Figure 4 for 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Viaarxiv icon

IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT

Add code
Apr 11, 2024
Viaarxiv icon

Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries

Add code
Feb 28, 2024
Figure 1 for Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries
Figure 2 for Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries
Figure 3 for Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries
Figure 4 for Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries
Viaarxiv icon