Picture for Zhaowei Zhang

Zhaowei Zhang

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment

Add code
Oct 22, 2024
Viaarxiv icon

Efficient Model-agnostic Alignment via Bayesian Persuasion

Add code
May 29, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects

Add code
Mar 01, 2024
Viaarxiv icon

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Add code
Jan 19, 2024
Figure 1 for CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Figure 2 for CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Figure 3 for CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Figure 4 for CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
Viaarxiv icon

AI Alignment: A Comprehensive Survey

Add code
Nov 01, 2023
Viaarxiv icon

Measuring Value Understanding in Language Models through Discriminator-Critique Gap

Add code
Oct 19, 2023
Viaarxiv icon

ProAgent: Building Proactive Cooperative AI with Large Language Models

Add code
Aug 28, 2023
Viaarxiv icon

Heterogeneous Value Evaluation for Large Language Models

Add code
Jun 01, 2023
Viaarxiv icon

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Add code
Apr 15, 2023
Viaarxiv icon