Picture for Hao Shao

Hao Shao

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Add code
Dec 15, 2024
Viaarxiv icon

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Figure 1 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 2 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 3 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 4 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Viaarxiv icon

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

Add code
Oct 11, 2024
Figure 1 for SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Figure 2 for SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Figure 3 for SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Figure 4 for SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Viaarxiv icon

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Figure 1 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 2 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 3 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 4 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Viaarxiv icon

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Add code
Mar 25, 2024
Figure 1 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 2 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 3 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Figure 4 for Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models
Viaarxiv icon

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Add code
Mar 19, 2024
Viaarxiv icon

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Add code
Feb 08, 2024
Viaarxiv icon

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Add code
Dec 21, 2023
Viaarxiv icon

MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention

Add code
Dec 20, 2023
Viaarxiv icon

Polyper: Boundary Sensitive Polyp Segmentation

Add code
Dec 14, 2023
Viaarxiv icon