Picture for Dongmei Jiang

Dongmei Jiang

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Add code
Mar 17, 2025
Viaarxiv icon

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Add code
Feb 27, 2025
Viaarxiv icon

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Add code
Jan 25, 2025
Viaarxiv icon

CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation

Add code
Jan 20, 2025
Viaarxiv icon

Transferable Adversarial Face Attack with Text Controlled Attribute

Add code
Dec 16, 2024
Figure 1 for Transferable Adversarial Face Attack with Text Controlled Attribute
Figure 2 for Transferable Adversarial Face Attack with Text Controlled Attribute
Figure 3 for Transferable Adversarial Face Attack with Text Controlled Attribute
Figure 4 for Transferable Adversarial Face Attack with Text Controlled Attribute
Viaarxiv icon

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Add code
Dec 01, 2024
Viaarxiv icon

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Add code
Nov 19, 2024
Figure 1 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 2 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 3 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 4 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Viaarxiv icon

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

Add code
Oct 08, 2024
Figure 1 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 2 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 3 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 4 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Viaarxiv icon

Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment

Add code
Sep 10, 2024
Figure 1 for Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Figure 2 for Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Figure 3 for Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Figure 4 for Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Viaarxiv icon

ExpLLM: Towards Chain of Thought for Facial Expression Recognition

Add code
Sep 04, 2024
Figure 1 for ExpLLM: Towards Chain of Thought for Facial Expression Recognition
Figure 2 for ExpLLM: Towards Chain of Thought for Facial Expression Recognition
Figure 3 for ExpLLM: Towards Chain of Thought for Facial Expression Recognition
Figure 4 for ExpLLM: Towards Chain of Thought for Facial Expression Recognition
Viaarxiv icon