Picture for Dongmei Jiang

Dongmei Jiang

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Add code
Jun 12, 2025
Figure 1 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 2 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 3 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 4 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Viaarxiv icon

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts

Add code
Jun 12, 2025
Figure 1 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 2 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 3 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 4 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Viaarxiv icon

Cross-DINO: Cross the Deep MLP and Transformer for Small Object Detection

Add code
May 28, 2025
Viaarxiv icon

Open-Det: An Efficient Learning Framework for Open-Ended Detection

Add code
May 27, 2025
Viaarxiv icon

Harmony: A Unified Framework for Modality Incremental Learning

Add code
Apr 17, 2025
Viaarxiv icon

Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval

Add code
Apr 16, 2025
Viaarxiv icon

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Add code
Mar 17, 2025
Figure 1 for Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Figure 2 for Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Figure 3 for Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Figure 4 for Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Viaarxiv icon

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Add code
Feb 27, 2025
Figure 1 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 2 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 3 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 4 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Viaarxiv icon

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Add code
Jan 25, 2025
Viaarxiv icon

CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation

Add code
Jan 20, 2025
Figure 1 for CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation
Figure 2 for CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation
Figure 3 for CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation
Figure 4 for CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation
Viaarxiv icon