Picture for Juncheng Li

Juncheng Li

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

Add code
Mar 19, 2025
Viaarxiv icon

SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

Add code
Mar 10, 2025
Viaarxiv icon

Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts

Add code
Mar 07, 2025
Viaarxiv icon

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

Add code
Mar 06, 2025
Viaarxiv icon

AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks

Add code
Feb 18, 2025
Viaarxiv icon

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation

Add code
Dec 28, 2024
Figure 1 for MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Figure 2 for MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Figure 3 for MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Figure 4 for MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Viaarxiv icon

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework

Add code
Dec 27, 2024
Viaarxiv icon

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Add code
Dec 13, 2024
Figure 1 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 2 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 3 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 4 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Viaarxiv icon

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

Add code
Dec 09, 2024
Figure 1 for Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Figure 2 for Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Figure 3 for Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Figure 4 for Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Viaarxiv icon

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Add code
Dec 08, 2024
Figure 1 for SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Figure 2 for SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Figure 3 for SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Figure 4 for SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Viaarxiv icon