Picture for Cameron Tice

Cameron Tice

Chain-of-thought obfuscation learned from output supervision can generalise to unseen tasks

Add code
Jan 30, 2026
Viaarxiv icon

Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

Add code
Jan 15, 2026
Viaarxiv icon

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Add code
Dec 02, 2024
Figure 1 for Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Figure 2 for Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Figure 3 for Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Figure 4 for Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Viaarxiv icon