Picture for Rui Miao

Rui Miao

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

Add code
May 25, 2026
Viaarxiv icon

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

Add code
May 21, 2026
Viaarxiv icon

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

Add code
May 19, 2026
Viaarxiv icon

Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation

Add code
May 19, 2026
Viaarxiv icon

Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions

Add code
May 19, 2026
Viaarxiv icon

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

Add code
May 12, 2026
Viaarxiv icon

On the Step Length Confounding in LLM Reasoning Data Selection

Add code
Apr 08, 2026
Viaarxiv icon

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

Add code
Mar 17, 2026
Viaarxiv icon

Physics-informed Diffusion Generation for Geomagnetic Map Interpolation

Add code
Jan 31, 2026
Viaarxiv icon

Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation

Add code
Dec 24, 2025
Viaarxiv icon