Picture for Zhuofan Xia

Zhuofan Xia

Bridging the Divide: Reconsidering Softmax and Linear Attention

Add code
Dec 09, 2024
Viaarxiv icon

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data

Add code
Nov 23, 2024
Figure 1 for Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
Figure 2 for Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
Figure 3 for Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
Figure 4 for Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
Viaarxiv icon

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

Add code
Aug 11, 2024
Figure 1 for Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Figure 2 for Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Figure 3 for Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Figure 4 for Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Viaarxiv icon

Demystify Mamba in Vision: A Linear Attention Perspective

Add code
May 26, 2024
Figure 1 for Demystify Mamba in Vision: A Linear Attention Perspective
Figure 2 for Demystify Mamba in Vision: A Linear Attention Perspective
Figure 3 for Demystify Mamba in Vision: A Linear Attention Perspective
Figure 4 for Demystify Mamba in Vision: A Linear Attention Perspective
Viaarxiv icon

Agent Attention: On the Integration of Softmax and Linear Attention

Add code
Dec 22, 2023
Figure 1 for Agent Attention: On the Integration of Softmax and Linear Attention
Figure 2 for Agent Attention: On the Integration of Softmax and Linear Attention
Figure 3 for Agent Attention: On the Integration of Softmax and Linear Attention
Figure 4 for Agent Attention: On the Integration of Softmax and Linear Attention
Viaarxiv icon

GSVA: Generalized Segmentation via Multimodal Large Language Models

Add code
Dec 15, 2023
Figure 1 for GSVA: Generalized Segmentation via Multimodal Large Language Models
Figure 2 for GSVA: Generalized Segmentation via Multimodal Large Language Models
Figure 3 for GSVA: Generalized Segmentation via Multimodal Large Language Models
Figure 4 for GSVA: Generalized Segmentation via Multimodal Large Language Models
Viaarxiv icon

Generalized Activation via Multivariate Projection

Add code
Sep 29, 2023
Viaarxiv icon

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

Add code
Sep 04, 2023
Figure 1 for DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Figure 2 for DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Figure 3 for DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Figure 4 for DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Viaarxiv icon

Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention

Add code
Apr 09, 2023
Figure 1 for Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Figure 2 for Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Figure 3 for Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Figure 4 for Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Viaarxiv icon

Adaptive Rotated Convolution for Rotated Object Detection

Add code
Mar 14, 2023
Figure 1 for Adaptive Rotated Convolution for Rotated Object Detection
Figure 2 for Adaptive Rotated Convolution for Rotated Object Detection
Figure 3 for Adaptive Rotated Convolution for Rotated Object Detection
Figure 4 for Adaptive Rotated Convolution for Rotated Object Detection
Viaarxiv icon