Picture for Junfeng Fang

Junfeng Fang

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

Add code
Feb 04, 2026
Viaarxiv icon

The Missing Half: Unveiling Training-time Implicit Safety Risks Beyond Deployment

Add code
Feb 04, 2026
Viaarxiv icon

RMBRec: Robust Multi-Behavior Recommendation towards Target Behaviors

Add code
Jan 13, 2026
Viaarxiv icon

Contrastive Weak-to-strong Generalization

Add code
Oct 09, 2025
Figure 1 for Contrastive Weak-to-strong Generalization
Figure 2 for Contrastive Weak-to-strong Generalization
Figure 3 for Contrastive Weak-to-strong Generalization
Figure 4 for Contrastive Weak-to-strong Generalization
Viaarxiv icon

On Predictability of Reinforcement Learning Dynamics for Large Language Models

Add code
Oct 02, 2025
Viaarxiv icon

We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems

Add code
Jun 16, 2025
Figure 1 for We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems
Figure 2 for We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems
Figure 3 for We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems
Figure 4 for We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems
Viaarxiv icon

Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs

Add code
Jun 16, 2025
Figure 1 for Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Figure 2 for Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Figure 3 for Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Figure 4 for Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Viaarxiv icon

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

Add code
Jun 08, 2025
Figure 1 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 2 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 3 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Figure 4 for AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Viaarxiv icon

Are Reasoning Models More Prone to Hallucination?

Add code
May 29, 2025
Viaarxiv icon

Advanced long-term earth system forecasting by learning the small-scale nature

Add code
May 26, 2025
Viaarxiv icon