Picture for Yueting Zhuang

Yueting Zhuang

MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task

Add code
Feb 17, 2025
Viaarxiv icon

STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion

Add code
Feb 10, 2025
Figure 1 for STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion
Figure 2 for STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion
Figure 3 for STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion
Figure 4 for STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion
Viaarxiv icon

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Add code
Jan 08, 2025
Viaarxiv icon

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Add code
Jan 03, 2025
Figure 1 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 2 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 3 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 4 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Viaarxiv icon

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation

Add code
Dec 28, 2024
Figure 1 for MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Figure 2 for MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Figure 3 for MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Figure 4 for MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Viaarxiv icon

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework

Add code
Dec 27, 2024
Viaarxiv icon

Meta-Reflection: A Feedback-Free Reflection Learning Framework

Add code
Dec 18, 2024
Figure 1 for Meta-Reflection: A Feedback-Free Reflection Learning Framework
Figure 2 for Meta-Reflection: A Feedback-Free Reflection Learning Framework
Figure 3 for Meta-Reflection: A Feedback-Free Reflection Learning Framework
Figure 4 for Meta-Reflection: A Feedback-Free Reflection Learning Framework
Viaarxiv icon

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Add code
Dec 13, 2024
Figure 1 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 2 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 3 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 4 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Viaarxiv icon

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

Add code
Dec 09, 2024
Figure 1 for Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Figure 2 for Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Figure 3 for Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Figure 4 for Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Viaarxiv icon

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

Add code
Nov 29, 2024
Viaarxiv icon