Picture for Xiaohan Zhang

Xiaohan Zhang

Carl Zeiss Meditec AG

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables

Add code
Jun 13, 2025
Viaarxiv icon

TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering

Add code
Jun 10, 2025
Viaarxiv icon

CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion

Add code
May 02, 2025
Viaarxiv icon

Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding

Add code
Apr 20, 2025
Viaarxiv icon

ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Add code
Jan 17, 2025
Viaarxiv icon

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Add code
Dec 30, 2024
Viaarxiv icon

RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model

Add code
Dec 27, 2024
Figure 1 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 2 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 3 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Figure 4 for RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model
Viaarxiv icon

SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control

Add code
Dec 20, 2024
Viaarxiv icon

Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories

Add code
Dec 13, 2024
Viaarxiv icon

CogVLM2: Visual Language Models for Image and Video Understanding

Add code
Aug 29, 2024
Figure 1 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 2 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 3 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 4 for CogVLM2: Visual Language Models for Image and Video Understanding
Viaarxiv icon