Picture for Pengxiang Li

Pengxiang Li

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

Add code
Jan 08, 2025
Viaarxiv icon

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Add code
Dec 20, 2024
Viaarxiv icon

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Add code
Dec 18, 2024
Viaarxiv icon

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models

Add code
Dec 02, 2024
Viaarxiv icon

DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting

Add code
Nov 26, 2024
Viaarxiv icon

Task-oriented Sequential Grounding in 3D Scenes

Add code
Aug 07, 2024
Figure 1 for Task-oriented Sequential Grounding in 3D Scenes
Figure 2 for Task-oriented Sequential Grounding in 3D Scenes
Figure 3 for Task-oriented Sequential Grounding in 3D Scenes
Figure 4 for Task-oriented Sequential Grounding in 3D Scenes
Viaarxiv icon

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

Add code
Jul 16, 2024
Viaarxiv icon

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

Add code
May 28, 2024
Figure 1 for OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
Figure 2 for OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
Figure 3 for OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
Figure 4 for OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
Viaarxiv icon

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Add code
Apr 16, 2024
Figure 1 for Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Figure 2 for Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Figure 3 for Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Figure 4 for Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Viaarxiv icon

TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models

Add code
Dec 01, 2023
Viaarxiv icon