Picture for Sinan Tan

Sinan Tan

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

Add code
Oct 02, 2024
Figure 1 for A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Figure 2 for A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Figure 3 for A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Figure 4 for A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Viaarxiv icon

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Add code
Sep 18, 2024
Viaarxiv icon

Qwen2 Technical Report

Add code
Jul 16, 2024
Figure 1 for Qwen2 Technical Report
Figure 2 for Qwen2 Technical Report
Figure 3 for Qwen2 Technical Report
Figure 4 for Qwen2 Technical Report
Viaarxiv icon

Qwen Technical Report

Add code
Sep 28, 2023
Figure 1 for Qwen Technical Report
Figure 2 for Qwen Technical Report
Figure 3 for Qwen Technical Report
Figure 4 for Qwen Technical Report
Viaarxiv icon

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Add code
Sep 14, 2023
Viaarxiv icon

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Add code
Dec 08, 2022
Viaarxiv icon

Mixed Neural Voxels for Fast Multi-view Video Synthesis

Add code
Dec 01, 2022
Viaarxiv icon

Embodied Referring Expression for Manipulation Question Answering in Interactive Environment

Add code
Oct 06, 2022
Figure 1 for Embodied Referring Expression for Manipulation Question Answering in Interactive Environment
Figure 2 for Embodied Referring Expression for Manipulation Question Answering in Interactive Environment
Figure 3 for Embodied Referring Expression for Manipulation Question Answering in Interactive Environment
Figure 4 for Embodied Referring Expression for Manipulation Question Answering in Interactive Environment
Viaarxiv icon

An Automated Question-Answering Framework Based on Evolution Algorithm

Add code
Jan 26, 2022
Figure 1 for An Automated Question-Answering Framework Based on Evolution Algorithm
Figure 2 for An Automated Question-Answering Framework Based on Evolution Algorithm
Figure 3 for An Automated Question-Answering Framework Based on Evolution Algorithm
Figure 4 for An Automated Question-Answering Framework Based on Evolution Algorithm
Viaarxiv icon

Self-supervised 3D Semantic Representation Learning for Vision-and-Language Navigation

Add code
Jan 26, 2022
Viaarxiv icon