Picture for Eric Wallace

Eric Wallace

Tony

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

Add code
Jun 28, 2024
Figure 1 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 2 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 3 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 4 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Viaarxiv icon

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Add code
Apr 19, 2024
Viaarxiv icon

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Add code
Mar 08, 2024
Viaarxiv icon

What Evidence Do Language Models Find Convincing?

Add code
Feb 19, 2024
Viaarxiv icon

Scalable Extraction of Training Data from (Production) Language Models

Add code
Nov 28, 2023
Figure 1 for Scalable Extraction of Training Data from (Production) Language Models
Figure 2 for Scalable Extraction of Training Data from (Production) Language Models
Figure 3 for Scalable Extraction of Training Data from (Production) Language Models
Figure 4 for Scalable Extraction of Training Data from (Production) Language Models
Viaarxiv icon

Privacy Side Channels in Machine Learning Systems

Add code
Sep 11, 2023
Viaarxiv icon

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

Add code
Aug 08, 2023
Viaarxiv icon

The False Promise of Imitating Proprietary LLMs

Add code
May 25, 2023
Viaarxiv icon

Poisoning Language Models During Instruction Tuning

Add code
May 01, 2023
Viaarxiv icon