Picture for Zhizheng Zhang

Zhizheng Zhang

Southeast University, China

Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

Add code
Dec 09, 2024
Viaarxiv icon

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Add code
Dec 05, 2024
Viaarxiv icon

A General Theory for Compositional Generalization

Add code
May 20, 2024
Viaarxiv icon

Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

Add code
May 13, 2024
Figure 1 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 2 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 3 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 4 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Viaarxiv icon

VisualCritic: Making LMMs Perceive Visual Quality Like Humans

Add code
Mar 19, 2024
Figure 1 for VisualCritic: Making LMMs Perceive Visual Quality Like Humans
Figure 2 for VisualCritic: Making LMMs Perceive Visual Quality Like Humans
Figure 3 for VisualCritic: Making LMMs Perceive Visual Quality Like Humans
Figure 4 for VisualCritic: Making LMMs Perceive Visual Quality Like Humans
Viaarxiv icon

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

Add code
Mar 19, 2024
Figure 1 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 2 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 3 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 4 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Viaarxiv icon

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Add code
Mar 01, 2024
Viaarxiv icon

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Add code
Feb 29, 2024
Viaarxiv icon

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

Add code
Oct 07, 2023
Viaarxiv icon

Adaptive Frequency Filters As Efficient Global Token Mixers

Add code
Jul 26, 2023
Viaarxiv icon