Picture for Victor Veitch

Victor Veitch

RATE: Score Reward Models with Imperfect Rewrites of Rewrites

Add code
Oct 15, 2024
Viaarxiv icon

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Add code
Jun 03, 2024
Viaarxiv icon

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

Add code
Jun 02, 2024
Viaarxiv icon

On the Origins of Linear Representations in Large Language Models

Add code
Mar 06, 2024
Viaarxiv icon

Transforming and Combining Rewards for Aligning Large Language Models

Add code
Feb 01, 2024
Viaarxiv icon

The Linear Representation Hypothesis and the Geometry of Large Language Models

Add code
Nov 07, 2023
Viaarxiv icon

Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness

Add code
Oct 30, 2023
Viaarxiv icon

Uncovering Meanings of Embeddings via Partial Orthogonality

Add code
Oct 26, 2023
Viaarxiv icon

Concept Algebra for Text-Controlled Vision Models

Add code
Feb 07, 2023
Viaarxiv icon

Efficient Conditionally Invariant Representation Learning

Add code
Dec 16, 2022
Viaarxiv icon