Picture for Renrui Zhang

Renrui Zhang

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

Add code
Jan 23, 2025
Viaarxiv icon

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Add code
Jan 23, 2025
Viaarxiv icon

TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction

Add code
Dec 22, 2024
Viaarxiv icon

Chimera: Improving Generalist Model with Domain-Specific Experts

Add code
Dec 08, 2024
Viaarxiv icon

Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

Add code
Nov 27, 2024
Figure 1 for Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Figure 2 for Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Figure 3 for Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Figure 4 for Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Viaarxiv icon

Point Cloud Understanding via Attention-Driven Contrastive Learning

Add code
Nov 22, 2024
Figure 1 for Point Cloud Understanding via Attention-Driven Contrastive Learning
Figure 2 for Point Cloud Understanding via Attention-Driven Contrastive Learning
Figure 3 for Point Cloud Understanding via Attention-Driven Contrastive Learning
Figure 4 for Point Cloud Understanding via Attention-Driven Contrastive Learning
Viaarxiv icon

Training-free Regional Prompting for Diffusion Transformers

Add code
Nov 04, 2024
Viaarxiv icon

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection

Add code
Oct 10, 2024
Figure 1 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 2 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 3 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 4 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Viaarxiv icon

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Add code
Aug 29, 2024
Figure 1 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 2 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 3 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 4 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Viaarxiv icon

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Figure 1 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 2 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 3 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 4 for LLaVA-OneVision: Easy Visual Task Transfer
Viaarxiv icon