Picture for Richang Hong

Richang Hong

Linguistics-Vision Monotonic Consistent Network for Sign Language Production

Add code
Dec 22, 2024
Viaarxiv icon

Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Add code
Dec 19, 2024
Figure 1 for Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
Figure 2 for Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
Figure 3 for Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
Figure 4 for Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
Viaarxiv icon

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

Add code
Dec 19, 2024
Viaarxiv icon

Moderating the Generalization of Score-based Generative Model

Add code
Dec 10, 2024
Viaarxiv icon

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observation

Add code
Nov 25, 2024
Figure 1 for Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observation
Figure 2 for Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observation
Figure 3 for Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observation
Figure 4 for Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observation
Viaarxiv icon

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

Add code
Nov 25, 2024
Viaarxiv icon

Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model

Add code
Nov 16, 2024
Viaarxiv icon

DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation

Add code
Oct 11, 2024
Figure 1 for DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation
Figure 2 for DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation
Figure 3 for DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation
Figure 4 for DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation
Viaarxiv icon

Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval

Add code
Oct 09, 2024
Figure 1 for Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
Figure 2 for Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
Figure 3 for Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
Figure 4 for Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval
Viaarxiv icon

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

Add code
Sep 30, 2024
Viaarxiv icon