Picture for Xiaohan Zhang

Xiaohan Zhang

Carl Zeiss Meditec AG

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Add code
Dec 30, 2024
Viaarxiv icon

RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model

Add code
Dec 27, 2024
Viaarxiv icon

SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control

Add code
Dec 20, 2024
Viaarxiv icon

Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories

Add code
Dec 13, 2024
Viaarxiv icon

CogVLM2: Visual Language Models for Image and Video Understanding

Add code
Aug 29, 2024
Figure 1 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 2 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 3 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 4 for CogVLM2: Visual Language Models for Image and Video Understanding
Viaarxiv icon

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Add code
Aug 12, 2024
Viaarxiv icon

Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance

Add code
Aug 08, 2024
Viaarxiv icon

DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

Add code
Jun 25, 2024
Viaarxiv icon

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

Add code
Jun 21, 2024
Viaarxiv icon

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Add code
Jun 18, 2024
Figure 1 for ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Figure 2 for ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Figure 3 for ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Figure 4 for ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Viaarxiv icon