Picture for Xiaohan Zhang

Xiaohan Zhang

Carl Zeiss Meditec AG

CogVLM2: Visual Language Models for Image and Video Understanding

Add code
Aug 29, 2024
Figure 1 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 2 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 3 for CogVLM2: Visual Language Models for Image and Video Understanding
Figure 4 for CogVLM2: Visual Language Models for Image and Video Understanding
Viaarxiv icon

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Add code
Aug 12, 2024
Viaarxiv icon

Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance

Add code
Aug 08, 2024
Viaarxiv icon

DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

Add code
Jun 25, 2024
Viaarxiv icon

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

Add code
Jun 21, 2024
Viaarxiv icon

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Add code
Jun 18, 2024
Viaarxiv icon

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

Add code
Jun 14, 2024
Viaarxiv icon

LVBench: An Extreme Long Video Understanding Benchmark

Add code
Jun 12, 2024
Figure 1 for LVBench: An Extreme Long Video Understanding Benchmark
Figure 2 for LVBench: An Extreme Long Video Understanding Benchmark
Figure 3 for LVBench: An Extreme Long Video Understanding Benchmark
Figure 4 for LVBench: An Extreme Long Video Understanding Benchmark
Viaarxiv icon

Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection

Add code
May 25, 2024
Viaarxiv icon

Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

Add code
May 10, 2024
Viaarxiv icon