Picture for Jaemin Cho

Jaemin Cho

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

Add code
Nov 22, 2024
Figure 1 for VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
Figure 2 for VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
Figure 3 for VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
Figure 4 for VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
Viaarxiv icon

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Add code
Nov 07, 2024
Figure 1 for M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Figure 2 for M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Figure 3 for M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Figure 4 for M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Viaarxiv icon

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

Add code
Oct 08, 2024
Viaarxiv icon

DOCCI: Descriptions of Connected and Contrasting Images

Add code
Apr 30, 2024
Viaarxiv icon

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Add code
Apr 15, 2024
Viaarxiv icon

Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts

Add code
Mar 31, 2024
Viaarxiv icon

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

Add code
Mar 18, 2024
Figure 1 for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Figure 2 for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Figure 3 for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Figure 4 for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Viaarxiv icon

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

Add code
Mar 11, 2024
Figure 1 for SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Figure 2 for SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Figure 3 for SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Figure 4 for SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Viaarxiv icon

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

Add code
Mar 04, 2024
Viaarxiv icon

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

Add code
Oct 30, 2023
Figure 1 for Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Figure 2 for Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Figure 3 for Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Figure 4 for Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Viaarxiv icon