Picture for Lumin Xu

Lumin Xu

Autoregressive Video Autoencoder with Decoupled Temporal and Spatial Context

Add code
Dec 12, 2025
Viaarxiv icon

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Add code
Mar 27, 2025
Figure 1 for Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Figure 2 for Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Figure 3 for Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Figure 4 for Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Viaarxiv icon

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

Add code
Nov 04, 2024
Figure 1 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 2 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 3 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 4 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Viaarxiv icon

TCFormer: Visual Recognition via Token Clustering Transformer

Add code
Jul 16, 2024
Figure 1 for TCFormer: Visual Recognition via Token Clustering Transformer
Figure 2 for TCFormer: Visual Recognition via Token Clustering Transformer
Figure 3 for TCFormer: Visual Recognition via Token Clustering Transformer
Figure 4 for TCFormer: Visual Recognition via Token Clustering Transformer
Viaarxiv icon

F-LMM: Grounding Frozen Large Multimodal Models

Add code
Jun 09, 2024
Figure 1 for F-LMM: Grounding Frozen Large Multimodal Models
Figure 2 for F-LMM: Grounding Frozen Large Multimodal Models
Figure 3 for F-LMM: Grounding Frozen Large Multimodal Models
Figure 4 for F-LMM: Grounding Frozen Large Multimodal Models
Viaarxiv icon

UniFS: Universal Few-shot Instance Perception with Point Representations

Add code
Apr 30, 2024
Figure 1 for UniFS: Universal Few-shot Instance Perception with Point Representations
Figure 2 for UniFS: Universal Few-shot Instance Perception with Point Representations
Figure 3 for UniFS: Universal Few-shot Instance Perception with Point Representations
Figure 4 for UniFS: Universal Few-shot Instance Perception with Point Representations
Viaarxiv icon

CLIM: Contrastive Language-Image Mosaic for Region Representation

Add code
Dec 19, 2023
Viaarxiv icon

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face

Add code
Oct 10, 2023
Viaarxiv icon

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Add code
Oct 02, 2023
Figure 1 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 2 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 3 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Figure 4 for CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Viaarxiv icon

GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

Add code
Aug 28, 2023
Figure 1 for GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
Figure 2 for GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
Figure 3 for GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
Figure 4 for GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
Viaarxiv icon