Picture for Yezhou Yang

Yezhou Yang

Arizona State University

TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark

Add code
Mar 17, 2025
Viaarxiv icon

Generative AI in Transportation Planning: A Survey

Add code
Mar 10, 2025
Viaarxiv icon

Biomedical Foundation Model: A Survey

Add code
Mar 03, 2025
Viaarxiv icon

Dual Caption Preference Optimization for Diffusion Models

Add code
Feb 09, 2025
Viaarxiv icon

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation

Add code
Nov 27, 2024
Viaarxiv icon

Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model

Add code
Nov 07, 2024
Figure 1 for Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
Figure 2 for Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
Figure 3 for Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
Figure 4 for Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
Viaarxiv icon

TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives

Add code
Nov 04, 2024
Figure 1 for TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Figure 2 for TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Figure 3 for TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Figure 4 for TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Viaarxiv icon

Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?

Add code
Oct 17, 2024
Figure 1 for Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
Figure 2 for Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
Figure 3 for Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
Figure 4 for Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
Viaarxiv icon

ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions

Add code
Oct 17, 2024
Figure 1 for ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Figure 2 for ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Figure 3 for ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Figure 4 for ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Viaarxiv icon

VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks

Add code
Oct 17, 2024
Figure 1 for VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Figure 2 for VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Figure 3 for VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Figure 4 for VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Viaarxiv icon