Picture for Diyi Yang

Diyi Yang

Stanford University

EgoNormia: Benchmarking Physical Social Norm Understanding

Add code
Feb 27, 2025
Viaarxiv icon

Mind the Gap! Static and Interactive Evaluations of Large Audio Models

Add code
Feb 21, 2025
Viaarxiv icon

EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking

Add code
Feb 18, 2025
Viaarxiv icon

No Preference Left Behind: Group Distributional Preference Optimization

Add code
Dec 28, 2024
Figure 1 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 2 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 3 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 4 for No Preference Left Behind: Group Distributional Preference Optimization
Viaarxiv icon

Dynamic Skill Adaptation for Large Language Models

Add code
Dec 26, 2024
Viaarxiv icon

Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

Add code
Dec 20, 2024
Figure 1 for Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Figure 2 for Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Figure 3 for Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Figure 4 for Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Viaarxiv icon

Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors

Add code
Nov 12, 2024
Figure 1 for Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Figure 2 for Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Figure 3 for Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Figure 4 for Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Viaarxiv icon

Attacking Vision-Language Computer Agents via Pop-ups

Add code
Nov 04, 2024
Figure 1 for Attacking Vision-Language Computer Agents via Pop-ups
Figure 2 for Attacking Vision-Language Computer Agents via Pop-ups
Figure 3 for Attacking Vision-Language Computer Agents via Pop-ups
Figure 4 for Attacking Vision-Language Computer Agents via Pop-ups
Viaarxiv icon

Personalization of Large Language Models: A Survey

Add code
Oct 29, 2024
Viaarxiv icon

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

Add code
Oct 21, 2024
Viaarxiv icon