Picture for Junjie Zhou

Junjie Zhou

All You Need in Knowledge Distillation Is a Tailored Coordinate System

Add code
Dec 12, 2024
Viaarxiv icon

Quantization without Tears

Add code
Nov 22, 2024
Viaarxiv icon

DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing

Add code
Nov 14, 2024
Figure 1 for DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing
Figure 2 for DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing
Figure 3 for DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing
Figure 4 for DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing
Viaarxiv icon

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Add code
Sep 24, 2024
Figure 1 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 2 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 3 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 4 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Viaarxiv icon

OmniGen: Unified Image Generation

Add code
Sep 17, 2024
Figure 1 for OmniGen: Unified Image Generation
Figure 2 for OmniGen: Unified Image Generation
Figure 3 for OmniGen: Unified Image Generation
Figure 4 for OmniGen: Unified Image Generation
Viaarxiv icon

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval

Add code
Jun 06, 2024
Figure 1 for VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Figure 2 for VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Figure 3 for VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Figure 4 for VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Viaarxiv icon

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

Add code
Jun 06, 2024
Figure 1 for MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Figure 2 for MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Figure 3 for MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Figure 4 for MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Viaarxiv icon

TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution

Add code
Aug 13, 2023
Viaarxiv icon

DocDiff: Document Enhancement via Residual Diffusion Models

Add code
May 06, 2023
Viaarxiv icon

SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation

Add code
Jan 17, 2023
Figure 1 for SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation
Figure 2 for SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation
Figure 3 for SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation
Figure 4 for SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation
Viaarxiv icon