Picture for Renrui Zhang

Renrui Zhang

Chimera: Improving Generalist Model with Domain-Specific Experts

Add code
Dec 08, 2024
Viaarxiv icon

Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

Add code
Nov 27, 2024
Viaarxiv icon

Point Cloud Understanding via Attention-Driven Contrastive Learning

Add code
Nov 22, 2024
Viaarxiv icon

Training-free Regional Prompting for Diffusion Transformers

Add code
Nov 04, 2024
Viaarxiv icon

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection

Add code
Oct 10, 2024
Figure 1 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 2 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 3 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 4 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Viaarxiv icon

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Add code
Aug 29, 2024
Figure 1 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 2 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 3 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 4 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Viaarxiv icon

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Figure 1 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 2 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 3 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 4 for LLaVA-OneVision: Easy Visual Task Transfer
Viaarxiv icon

MAVIS: Mathematical Visual Instruction Tuning

Add code
Jul 11, 2024
Figure 1 for MAVIS: Mathematical Visual Instruction Tuning
Figure 2 for MAVIS: Mathematical Visual Instruction Tuning
Figure 3 for MAVIS: Mathematical Visual Instruction Tuning
Figure 4 for MAVIS: Mathematical Visual Instruction Tuning
Viaarxiv icon

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Add code
Jul 10, 2024
Viaarxiv icon

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

Add code
Jun 06, 2024
Figure 1 for RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation
Figure 2 for RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation
Figure 3 for RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation
Figure 4 for RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation
Viaarxiv icon