Picture for Laura Weidinger

Laura Weidinger

Toward an Evaluation Science for Generative AI Systems

Add code
Mar 07, 2025
Viaarxiv icon

Do LLMs exhibit demographic parity in responses to queries about Human Rights?

Add code
Feb 26, 2025
Viaarxiv icon

Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models

Add code
Feb 10, 2025
Viaarxiv icon

Operationalizing Contextual Integrity in Privacy-Conscious Assistants

Add code
Aug 05, 2024
Figure 1 for Operationalizing Contextual Integrity in Privacy-Conscious Assistants
Figure 2 for Operationalizing Contextual Integrity in Privacy-Conscious Assistants
Figure 3 for Operationalizing Contextual Integrity in Privacy-Conscious Assistants
Figure 4 for Operationalizing Contextual Integrity in Privacy-Conscious Assistants
Viaarxiv icon

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Add code
Jun 26, 2024
Figure 1 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Figure 2 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Figure 3 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Viaarxiv icon

STAR: SocioTechnical Approach to Red Teaming Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Holistic Safety and Responsibility Evaluations of Advanced AI Models

Add code
Apr 22, 2024
Viaarxiv icon

Sociotechnical Safety Evaluation of Generative AI Systems

Add code
Oct 31, 2023
Viaarxiv icon

Improving alignment of dialogue agents via targeted human judgements

Add code
Sep 28, 2022
Figure 1 for Improving alignment of dialogue agents via targeted human judgements
Figure 2 for Improving alignment of dialogue agents via targeted human judgements
Figure 3 for Improving alignment of dialogue agents via targeted human judgements
Figure 4 for Improving alignment of dialogue agents via targeted human judgements
Viaarxiv icon

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models

Add code
Jun 16, 2022
Figure 1 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Figure 2 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Figure 3 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Viaarxiv icon