Picture for Zhijing Jin

Zhijing Jin

Evaluating Cooperation in LLM Social Groups through Elected Leadership

Add code
Apr 13, 2026
Viaarxiv icon

CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

Add code
Mar 22, 2026
Viaarxiv icon

When Do Language Models Endorse Limitations on Human Rights Principles?

Add code
Mar 04, 2026
Viaarxiv icon

GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory

Add code
Feb 12, 2026
Viaarxiv icon

IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery

Add code
Feb 08, 2026
Viaarxiv icon

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering

Add code
Feb 06, 2026
Viaarxiv icon

Fluid Representations in Reasoning Models

Add code
Feb 04, 2026
Viaarxiv icon

BinaryPPO: Efficient Policy Optimization for Binary Classification

Add code
Feb 02, 2026
Viaarxiv icon

Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification

Add code
Jan 29, 2026
Viaarxiv icon

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders

Add code
Nov 13, 2025
Figure 1 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 2 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 3 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Figure 4 for Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders
Viaarxiv icon