Picture for Xiaozhong Lyu

Xiaozhong Lyu

RedPajama: an Open Dataset for Training Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

EgoGen: An Egocentric Synthetic Data Generator

Add code
Jan 16, 2024
Viaarxiv icon

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

Add code
Jul 06, 2023
Viaarxiv icon