Picture for David A. Ross

David A. Ross

Language-Guided Image Tokenization for Generation

Add code
Dec 08, 2024
Viaarxiv icon

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Add code
Mar 02, 2024
Figure 1 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 2 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 3 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 4 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Add code
Oct 09, 2023
Viaarxiv icon

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Add code
Jul 03, 2023
Viaarxiv icon

$IC^3$: Image Captioning by Committee Consensus

Add code
Feb 16, 2023
Viaarxiv icon

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

Add code
Dec 20, 2022
Viaarxiv icon

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Add code
Dec 10, 2022
Viaarxiv icon

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

Add code
May 12, 2022
Figure 1 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 2 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 3 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 4 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Viaarxiv icon

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation

Add code
Feb 02, 2021
Figure 1 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 2 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 3 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 4 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Viaarxiv icon