Picture for Juan Ciro

Juan Ciro

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Add code
Apr 24, 2024
Viaarxiv icon

Speech Wikimedia: A 77 Language Multilingual Speech Dataset

Add code
Aug 30, 2023
Viaarxiv icon

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

Add code
May 22, 2023
Viaarxiv icon

DataPerf: Benchmarks for Data-Centric AI Development

Add code
Jul 20, 2022
Figure 1 for DataPerf: Benchmarks for Data-Centric AI Development
Figure 2 for DataPerf: Benchmarks for Data-Centric AI Development
Figure 3 for DataPerf: Benchmarks for Data-Centric AI Development
Figure 4 for DataPerf: Benchmarks for Data-Centric AI Development
Viaarxiv icon

LSH methods for data deduplication in a Wikipedia artificial dataset

Add code
Dec 10, 2021
Viaarxiv icon

The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

Add code
Nov 17, 2021
Figure 1 for The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Figure 2 for The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Figure 3 for The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Figure 4 for The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Viaarxiv icon