Picture for Tristan Thrush

Tristan Thrush

Nearest Neighbor Normalization Improves Multimodal Retrieval

Add code
Oct 31, 2024
Figure 1 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 2 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 3 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Figure 4 for Nearest Neighbor Normalization Improves Multimodal Retrieval
Viaarxiv icon

Improving Pretraining Data Using Perplexity Correlations

Add code
Sep 09, 2024
Viaarxiv icon

ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation

Add code
Feb 07, 2024
Viaarxiv icon

I am a Strange Dataset: Metalinguistic Tests for Language Models

Add code
Jan 10, 2024
Viaarxiv icon

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Add code
Jun 28, 2023
Viaarxiv icon

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Viaarxiv icon

Measuring Data

Add code
Dec 09, 2022
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Add code
Oct 06, 2022
Figure 1 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Figure 2 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Figure 3 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Viaarxiv icon

DataPerf: Benchmarks for Data-Centric AI Development

Add code
Jul 20, 2022
Figure 1 for DataPerf: Benchmarks for Data-Centric AI Development
Figure 2 for DataPerf: Benchmarks for Data-Centric AI Development
Figure 3 for DataPerf: Benchmarks for Data-Centric AI Development
Figure 4 for DataPerf: Benchmarks for Data-Centric AI Development
Viaarxiv icon