Picture for Humphrey Shi

Humphrey Shi

RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism

Add code
Apr 09, 2025
Viaarxiv icon

Slow-Fast Architecture for Video Multi-Modal Large Language Models

Add code
Apr 02, 2025
Viaarxiv icon

Safe Vision-Language Models via Unsafe Weights Manipulation

Add code
Mar 14, 2025
Viaarxiv icon

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Add code
Feb 27, 2025
Viaarxiv icon

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting

Add code
Dec 26, 2024
Figure 1 for CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Figure 2 for CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Figure 3 for CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Figure 4 for CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Viaarxiv icon

CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices

Add code
Dec 17, 2024
Figure 1 for CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices
Figure 2 for CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices
Figure 3 for CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices
Figure 4 for CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices
Viaarxiv icon

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Add code
Dec 12, 2024
Viaarxiv icon

GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models

Add code
Aug 29, 2024
Figure 1 for GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models
Figure 2 for GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models
Figure 3 for GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models
Figure 4 for GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models
Viaarxiv icon

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Add code
Aug 28, 2024
Figure 1 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 2 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 3 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 4 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Viaarxiv icon

Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

Add code
Aug 01, 2024
Figure 1 for Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Figure 2 for Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Figure 3 for Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Figure 4 for Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Viaarxiv icon