Picture for Yanzhe Zhang

Yanzhe Zhang

Attacking Vision-Language Computer Agents via Pop-ups

Add code
Nov 04, 2024
Viaarxiv icon

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

Add code
Oct 21, 2024
Viaarxiv icon

Distilling an End-to-End Voice Assistant Without Instruction Training Data

Add code
Oct 03, 2024
Viaarxiv icon

TRINS: Towards Multimodal Language Models that Can Read

Add code
Jun 10, 2024
Viaarxiv icon

Best Practices and Lessons Learned on Synthetic Data for Language Models

Add code
Apr 11, 2024
Figure 1 for Best Practices and Lessons Learned on Synthetic Data for Language Models
Viaarxiv icon

Design2Code: How Far Are We From Automating Front-End Engineering?

Add code
Mar 05, 2024
Viaarxiv icon

Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization

Add code
Oct 03, 2023
Figure 1 for Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Figure 2 for Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Figure 3 for Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Figure 4 for Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Viaarxiv icon

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Add code
Jun 29, 2023
Viaarxiv icon

Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints

Add code
Feb 17, 2023
Viaarxiv icon

Auditing Gender Presentation Differences in Text-to-Image Models

Add code
Feb 08, 2023
Viaarxiv icon