Picture for Tianjian Li

Tianjian Li

Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets

Add code
Oct 06, 2024
Viaarxiv icon

Benchmarking Language Model Creativity: A Case Study on Code Generation

Add code
Jul 12, 2024
Viaarxiv icon

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Add code
Apr 05, 2024
Viaarxiv icon

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

Add code
Oct 02, 2023
Viaarxiv icon

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Add code
May 31, 2023
Viaarxiv icon

Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution

Add code
May 27, 2023
Viaarxiv icon