Picture for Guan Pang

Guan Pang

Jack

Movie Gen: A Cast of Media Foundation Models

Add code
Oct 17, 2024
Figure 1 for Movie Gen: A Cast of Media Foundation Models
Figure 2 for Movie Gen: A Cast of Media Foundation Models
Figure 3 for Movie Gen: A Cast of Media Foundation Models
Figure 4 for Movie Gen: A Cast of Media Foundation Models
Viaarxiv icon

TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Add code
Oct 07, 2024
Figure 1 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 2 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 3 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 4 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models

Add code
Jun 03, 2024
Figure 1 for SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Figure 2 for SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Figure 3 for SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Figure 4 for SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Viaarxiv icon

Animated Stickers: Bringing Stickers to Life with Video Diffusion

Add code
Feb 08, 2024
Figure 1 for Animated Stickers: Bringing Stickers to Life with Video Diffusion
Figure 2 for Animated Stickers: Bringing Stickers to Life with Video Diffusion
Figure 3 for Animated Stickers: Bringing Stickers to Life with Video Diffusion
Figure 4 for Animated Stickers: Bringing Stickers to Life with Video Diffusion
Viaarxiv icon

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Add code
Dec 06, 2023
Figure 1 for LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Figure 2 for LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Figure 3 for LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Figure 4 for LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Viaarxiv icon

DISGO: Automatic End-to-End Evaluation for Scene Text OCR

Add code
Aug 25, 2023
Figure 1 for DISGO: Automatic End-to-End Evaluation for Scene Text OCR
Figure 2 for DISGO: Automatic End-to-End Evaluation for Scene Text OCR
Figure 3 for DISGO: Automatic End-to-End Evaluation for Scene Text OCR
Figure 4 for DISGO: Automatic End-to-End Evaluation for Scene Text OCR
Viaarxiv icon

Text-Conditional Contextualized Avatars For Zero-Shot Personalization

Add code
Apr 14, 2023
Figure 1 for Text-Conditional Contextualized Avatars For Zero-Shot Personalization
Figure 2 for Text-Conditional Contextualized Avatars For Zero-Shot Personalization
Figure 3 for Text-Conditional Contextualized Avatars For Zero-Shot Personalization
Figure 4 for Text-Conditional Contextualized Avatars For Zero-Shot Personalization
Viaarxiv icon

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

Add code
Apr 28, 2022
Figure 1 for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Figure 2 for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Figure 3 for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Figure 4 for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Viaarxiv icon

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

Add code
Apr 07, 2022
Figure 1 for Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Figure 2 for Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Figure 3 for Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Figure 4 for Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Viaarxiv icon