Picture for Zhaowei Zhang

Zhaowei Zhang

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment

Add code
Oct 22, 2024
Viaarxiv icon

Efficient Model-agnostic Alignment via Bayesian Persuasion

Add code
May 29, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Figure 1 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 2 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 3 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 4 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Viaarxiv icon

Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects

Add code
Mar 01, 2024
Viaarxiv icon

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Add code
Jan 19, 2024
Figure 1 for CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Figure 2 for CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Figure 3 for CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Figure 4 for CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Viaarxiv icon

AI Alignment: A Comprehensive Survey

Add code
Nov 01, 2023
Viaarxiv icon

Measuring Value Understanding in Language Models through Discriminator-Critique Gap

Add code
Oct 19, 2023
Viaarxiv icon

ProAgent: Building Proactive Cooperative AI with Large Language Models

Add code
Aug 28, 2023
Viaarxiv icon

Heterogeneous Value Evaluation for Large Language Models

Add code
Jun 01, 2023
Viaarxiv icon

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Add code
Apr 15, 2023
Viaarxiv icon