Picture for Jianmin Bao

Jianmin Bao

MageBench: Bridging Large Multimodal Models to Agents

Add code
Dec 05, 2024
Viaarxiv icon

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Add code
Nov 29, 2024
Viaarxiv icon

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Add code
Nov 20, 2024
Viaarxiv icon

SynChart: Synthesizing Charts from Language Models

Add code
Sep 25, 2024
Viaarxiv icon

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Add code
Jun 12, 2024
Viaarxiv icon

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

Add code
Dec 18, 2023
Viaarxiv icon

Towards More Unified In-context Visual Understanding

Add code
Dec 05, 2023
Figure 1 for Towards More Unified In-context Visual Understanding
Figure 2 for Towards More Unified In-context Visual Understanding
Figure 3 for Towards More Unified In-context Visual Understanding
Figure 4 for Towards More Unified In-context Visual Understanding
Viaarxiv icon

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

Add code
Nov 30, 2023
Viaarxiv icon

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Add code
Nov 30, 2023
Figure 1 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 2 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 3 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Figure 4 for MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Viaarxiv icon

PersonMAE: Person Re-Identification Pre-Training with Masked AutoEncoders

Add code
Nov 08, 2023
Viaarxiv icon