Picture for Yanzhe Zhang

Yanzhe Zhang

EgoNormia: Benchmarking Physical Social Norm Understanding

Add code
Feb 27, 2025
Viaarxiv icon

Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference

Add code
Dec 26, 2024
Figure 1 for Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference
Figure 2 for Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference
Viaarxiv icon

Attacking Vision-Language Computer Agents via Pop-ups

Add code
Nov 04, 2024
Figure 1 for Attacking Vision-Language Computer Agents via Pop-ups
Figure 2 for Attacking Vision-Language Computer Agents via Pop-ups
Figure 3 for Attacking Vision-Language Computer Agents via Pop-ups
Figure 4 for Attacking Vision-Language Computer Agents via Pop-ups
Viaarxiv icon

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

Add code
Oct 21, 2024
Viaarxiv icon

Distilling an End-to-End Voice Assistant Without Instruction Training Data

Add code
Oct 03, 2024
Figure 1 for Distilling an End-to-End Voice Assistant Without Instruction Training Data
Figure 2 for Distilling an End-to-End Voice Assistant Without Instruction Training Data
Figure 3 for Distilling an End-to-End Voice Assistant Without Instruction Training Data
Figure 4 for Distilling an End-to-End Voice Assistant Without Instruction Training Data
Viaarxiv icon

TRINS: Towards Multimodal Language Models that Can Read

Add code
Jun 10, 2024
Figure 1 for TRINS: Towards Multimodal Language Models that Can Read
Figure 2 for TRINS: Towards Multimodal Language Models that Can Read
Figure 3 for TRINS: Towards Multimodal Language Models that Can Read
Figure 4 for TRINS: Towards Multimodal Language Models that Can Read
Viaarxiv icon

Best Practices and Lessons Learned on Synthetic Data for Language Models

Add code
Apr 11, 2024
Figure 1 for Best Practices and Lessons Learned on Synthetic Data for Language Models
Viaarxiv icon

Design2Code: How Far Are We From Automating Front-End Engineering?

Add code
Mar 05, 2024
Figure 1 for Design2Code: How Far Are We From Automating Front-End Engineering?
Figure 2 for Design2Code: How Far Are We From Automating Front-End Engineering?
Figure 3 for Design2Code: How Far Are We From Automating Front-End Engineering?
Figure 4 for Design2Code: How Far Are We From Automating Front-End Engineering?
Viaarxiv icon

Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization

Add code
Oct 03, 2023
Figure 1 for Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Figure 2 for Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Figure 3 for Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Figure 4 for Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Viaarxiv icon

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Add code
Jun 29, 2023
Viaarxiv icon