Picture for Yu Ying Chiu

Yu Ying Chiu

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

Add code
Oct 03, 2024
Viaarxiv icon

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Add code
Oct 03, 2024
Figure 1 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 2 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 3 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 4 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Viaarxiv icon

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Add code
Jul 24, 2024
Figure 1 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 2 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 3 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 4 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Viaarxiv icon

Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence

Add code
May 24, 2024
Viaarxiv icon

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Add code
Apr 10, 2024
Figure 1 for CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Figure 2 for CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Figure 3 for CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Figure 4 for CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Viaarxiv icon

A Computational Framework for Behavioral Assessment of LLM Therapists

Add code
Jan 01, 2024
Viaarxiv icon

Humanoid Agents: Platform for Simulating Human-like Generative Agents

Add code
Oct 09, 2023
Viaarxiv icon