Picture for Jianmin Bao

Jianmin Bao

SmartEraser: Remove Anything from Images using Masked-Region Guidance

Add code
Jan 14, 2025
Viaarxiv icon

MageBench: Bridging Large Multimodal Models to Agents

Add code
Dec 05, 2024
Viaarxiv icon

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Add code
Nov 29, 2024
Viaarxiv icon

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Add code
Nov 20, 2024
Viaarxiv icon

SynChart: Synthesizing Charts from Language Models

Add code
Sep 25, 2024
Viaarxiv icon

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Add code
Jun 12, 2024
Figure 1 for FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Figure 2 for FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Figure 3 for FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Figure 4 for FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Viaarxiv icon

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

Add code
Dec 18, 2023
Figure 1 for VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder
Figure 2 for VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder
Figure 3 for VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder
Figure 4 for VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder
Viaarxiv icon

Towards More Unified In-context Visual Understanding

Add code
Dec 05, 2023
Figure 1 for Towards More Unified In-context Visual Understanding
Figure 2 for Towards More Unified In-context Visual Understanding
Figure 3 for Towards More Unified In-context Visual Understanding
Figure 4 for Towards More Unified In-context Visual Understanding
Viaarxiv icon

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Add code
Nov 30, 2023
Figure 1 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 2 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 3 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 4 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Viaarxiv icon

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

Add code
Nov 30, 2023
Viaarxiv icon