Picture for Hongsheng Li

Hongsheng Li

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Viaarxiv icon

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Add code
Dec 12, 2024
Viaarxiv icon

StreamChat: Chatting with Streaming Video

Add code
Dec 11, 2024
Viaarxiv icon

FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes

Add code
Dec 04, 2024
Viaarxiv icon

TimeWalker: Personalized Neural Space for Lifelong Head Avatars

Add code
Dec 03, 2024
Viaarxiv icon

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

Add code
Dec 02, 2024
Viaarxiv icon

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Add code
Nov 16, 2024
Viaarxiv icon

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving

Add code
Nov 08, 2024
Figure 1 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 2 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 3 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 4 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Viaarxiv icon

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

Add code
Nov 04, 2024
Figure 1 for A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
Figure 2 for A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
Figure 3 for A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
Figure 4 for A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
Viaarxiv icon

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events

Add code
Oct 27, 2024
Viaarxiv icon