Picture for Xintong Li

Xintong Li

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

Add code
May 11, 2026
Viaarxiv icon

Skill-R1: Agent Skill Evolution via Reinforcement Learning

Add code
May 10, 2026
Viaarxiv icon

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning

Add code
Feb 27, 2026
Viaarxiv icon

WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning

Add code
Feb 19, 2026
Viaarxiv icon

AMPS: Adaptive Modality Preference Steering via Functional Entropy

Add code
Feb 13, 2026
Viaarxiv icon

SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes

Add code
Jan 09, 2026
Viaarxiv icon

EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

Add code
Jan 05, 2026
Viaarxiv icon

ASRL:A robust loss function with potential for development

Add code
Apr 09, 2025
Viaarxiv icon

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Add code
Apr 09, 2025
Viaarxiv icon

Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning

Add code
Mar 10, 2025
Viaarxiv icon