Picture for Lijuan Wang

Lijuan Wang

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Add code
Apr 10, 2025
Viaarxiv icon

V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

Measurement of LLM's Philosophies of Human Nature

Add code
Apr 03, 2025
Viaarxiv icon

Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models

Add code
Mar 26, 2025
Viaarxiv icon

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising

Add code
Mar 26, 2025
Viaarxiv icon

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

Add code
Mar 25, 2025
Viaarxiv icon

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Add code
Feb 25, 2025
Viaarxiv icon

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

Add code
Feb 11, 2025
Viaarxiv icon

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

Add code
Jan 09, 2025
Viaarxiv icon

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Add code
Nov 26, 2024
Figure 1 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 2 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 3 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 4 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Viaarxiv icon