Picture for Jiwen Zhang

Jiwen Zhang

MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution

Add code
Jan 27, 2026
Viaarxiv icon

A Graph Prompt Fine-Tuning Method for WSN Spatio-Temporal Correlation Anomaly Detection

Add code
Jan 19, 2026
Viaarxiv icon

SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation

Add code
Jan 11, 2026
Viaarxiv icon

A robust and compliant robotic assembly control strategy for batch precision assembly task with uncertain fit types and fit amounts

Add code
Aug 17, 2025
Viaarxiv icon

AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs

Add code
May 27, 2025
Viaarxiv icon

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Add code
Oct 07, 2024
Viaarxiv icon

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Add code
May 28, 2024
Figure 1 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 2 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 3 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Figure 4 for VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Viaarxiv icon

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Add code
Apr 02, 2024
Figure 1 for DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Figure 2 for DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Figure 3 for DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Figure 4 for DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Viaarxiv icon

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Add code
Mar 05, 2024
Figure 1 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 2 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 3 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 4 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Viaarxiv icon

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

Add code
Oct 17, 2023
Figure 1 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 2 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 3 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 4 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Viaarxiv icon