Picture for Xihan Wei

Xihan Wei

HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding

Add code
Jan 25, 2025
Viaarxiv icon

Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis

Add code
Jan 16, 2025
Viaarxiv icon

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness

Add code
Jan 14, 2025
Viaarxiv icon

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

Add code
Jan 09, 2025
Viaarxiv icon

Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models

Add code
Oct 25, 2024
Figure 1 for Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Figure 2 for Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Figure 3 for Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Figure 4 for Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Viaarxiv icon

DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

Add code
Apr 09, 2024
Figure 1 for DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
Figure 2 for DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
Figure 3 for DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
Figure 4 for DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
Viaarxiv icon

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

Add code
Jul 27, 2022
Figure 1 for Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Figure 2 for Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Figure 3 for Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Figure 4 for Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Viaarxiv icon

SP-ViT: Learning 2D Spatial Priors for Vision Transformers

Add code
Jun 15, 2022
Figure 1 for SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Figure 2 for SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Figure 3 for SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Figure 4 for SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Viaarxiv icon

Continual Local Replacement for Few-shot Image Recognition

Add code
Jan 23, 2020
Figure 1 for Continual Local Replacement for Few-shot Image Recognition
Figure 2 for Continual Local Replacement for Few-shot Image Recognition
Figure 3 for Continual Local Replacement for Few-shot Image Recognition
Figure 4 for Continual Local Replacement for Few-shot Image Recognition
Viaarxiv icon

Learning Continually from Low-shot Data Stream

Add code
Sep 04, 2019
Figure 1 for Learning Continually from Low-shot Data Stream
Figure 2 for Learning Continually from Low-shot Data Stream
Figure 3 for Learning Continually from Low-shot Data Stream
Figure 4 for Learning Continually from Low-shot Data Stream
Viaarxiv icon