Picture for Pengchuan Zhang

Pengchuan Zhang

Jack

TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Add code
Oct 07, 2024
Figure 1 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 2 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 3 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 4 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Learning Video Context as Interleaved Multimodal Sequences

Add code
Jul 31, 2024
Figure 1 for Learning Video Context as Interleaved Multimodal Sequences
Figure 2 for Learning Video Context as Interleaved Multimodal Sequences
Figure 3 for Learning Video Context as Interleaved Multimodal Sequences
Figure 4 for Learning Video Context as Interleaved Multimodal Sequences
Viaarxiv icon

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

Add code
Jun 19, 2024
Figure 1 for GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Figure 2 for GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Figure 3 for GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Figure 4 for GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Viaarxiv icon

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Add code
May 15, 2024
Figure 1 for BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Figure 2 for BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Figure 3 for BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Figure 4 for BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Viaarxiv icon

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Add code
Apr 01, 2024
Figure 1 for Evaluating Text-to-Visual Generation with Image-to-Text Generation
Figure 2 for Evaluating Text-to-Visual Generation with Image-to-Text Generation
Figure 3 for Evaluating Text-to-Visual Generation with Image-to-Text Generation
Figure 4 for Evaluating Text-to-Visual Generation with Image-to-Text Generation
Viaarxiv icon

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

Add code
Nov 15, 2023
Figure 1 for The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Figure 2 for The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Figure 3 for The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Figure 4 for The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
Viaarxiv icon

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Add code
Oct 26, 2023
Figure 1 for MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Figure 2 for MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Figure 3 for MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Figure 4 for MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Viaarxiv icon

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding

Add code
Sep 20, 2023
Viaarxiv icon

UniVTG: Towards Unified Video-Language Temporal Grounding

Add code
Aug 18, 2023
Viaarxiv icon