Picture for Alireza Fathi

Alireza Fathi

Visual Lexicon: Rich Image Features in Language Space

Add code
Dec 09, 2024
Viaarxiv icon

Language-Guided Image Tokenization for Generation

Add code
Dec 08, 2024
Viaarxiv icon

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Add code
Oct 31, 2024
Viaarxiv icon

A Generative Approach for Wikipedia-Scale Visual Entity Recognition

Add code
Mar 04, 2024
Viaarxiv icon

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Add code
Mar 02, 2024
Figure 1 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 2 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 3 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 4 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Viaarxiv icon

AVIS: Autonomous Visual Information Seeking with Large Language Models

Add code
Jun 13, 2023
Viaarxiv icon

Retrieval-Enhanced Contrastive Vision-Text Models

Add code
Jun 12, 2023
Viaarxiv icon

Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

Add code
Apr 11, 2023
Viaarxiv icon

Learning Object-Centric Neural Scattering Functions for Free-viewpoint Relighting and Scene Composition

Add code
Mar 10, 2023
Figure 1 for Learning Object-Centric Neural Scattering Functions for Free-viewpoint Relighting and Scene Composition
Figure 2 for Learning Object-Centric Neural Scattering Functions for Free-viewpoint Relighting and Scene Composition
Figure 3 for Learning Object-Centric Neural Scattering Functions for Free-viewpoint Relighting and Scene Composition
Figure 4 for Learning Object-Centric Neural Scattering Functions for Free-viewpoint Relighting and Scene Composition
Viaarxiv icon

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Add code
Dec 10, 2022
Figure 1 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 2 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 3 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 4 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Viaarxiv icon