Picture for Jonathan Huang

Jonathan Huang

MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

Add code
Feb 18, 2025
Viaarxiv icon

Learning Complex Non-Rigid Image Edits from Multimodal Conditioning

Add code
Dec 13, 2024
Viaarxiv icon

Principles of Visual Tokens for Efficient Video Understanding

Add code
Nov 20, 2024
Viaarxiv icon

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Add code
Oct 09, 2024
Figure 1 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 2 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 3 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Figure 4 for Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Viaarxiv icon

Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors

Add code
Jul 14, 2024
Viaarxiv icon

Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations

Add code
Jun 17, 2024
Figure 1 for Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations
Figure 2 for Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations
Figure 3 for Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations
Figure 4 for Learning Hierarchical Semantic Classification by Grounding on Consistent Image Segmentations
Viaarxiv icon

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Add code
Dec 21, 2023
Figure 1 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 2 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 3 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 4 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Viaarxiv icon

Text and Click inputs for unambiguous open vocabulary instance segmentation

Add code
Nov 24, 2023
Figure 1 for Text and Click inputs for unambiguous open vocabulary instance segmentation
Figure 2 for Text and Click inputs for unambiguous open vocabulary instance segmentation
Figure 3 for Text and Click inputs for unambiguous open vocabulary instance segmentation
Figure 4 for Text and Click inputs for unambiguous open vocabulary instance segmentation
Viaarxiv icon

Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

Add code
Jun 07, 2023
Figure 1 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 2 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 3 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Figure 4 for Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Viaarxiv icon

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Add code
Jun 02, 2023
Figure 1 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 2 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 3 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 4 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Viaarxiv icon