Picture for Xiangtai Li

Xiangtai Li

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

Add code
Feb 18, 2025
Viaarxiv icon

UMC: Unified Resilient Controller for Legged Robots with Joint Malfunctions

Add code
Feb 05, 2025
Viaarxiv icon

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Add code
Jan 08, 2025
Viaarxiv icon

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Add code
Jan 07, 2025
Figure 1 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 2 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 3 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 4 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Viaarxiv icon

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Add code
Dec 10, 2024
Figure 1 for DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Figure 2 for DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Figure 3 for DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Figure 4 for DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Viaarxiv icon

EMOv2: Pushing 5M Vision Model Frontier

Add code
Dec 09, 2024
Figure 1 for EMOv2: Pushing 5M Vision Model Frontier
Figure 2 for EMOv2: Pushing 5M Vision Model Frontier
Figure 3 for EMOv2: Pushing 5M Vision Model Frontier
Figure 4 for EMOv2: Pushing 5M Vision Model Frontier
Viaarxiv icon

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Add code
Dec 05, 2024
Figure 1 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
Figure 2 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
Figure 3 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
Figure 4 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
Viaarxiv icon

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Add code
Dec 05, 2024
Viaarxiv icon

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

Add code
Dec 04, 2024
Figure 1 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 2 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 3 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 4 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Viaarxiv icon

RelationBooth: Towards Relation-Aware Customized Object Generation

Add code
Oct 30, 2024
Viaarxiv icon