Picture for Akifumi Wachi

Akifumi Wachi

MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning

Add code
May 31, 2026
Viaarxiv icon

Interaction-Limited Safe Continuous-Time RL for Dynamical Medical Treatment

Add code
May 31, 2026
Viaarxiv icon

How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis

Add code
May 23, 2026
Viaarxiv icon

Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning

Add code
Mar 16, 2026
Viaarxiv icon

A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning

Add code
Feb 02, 2026
Viaarxiv icon

Inference-Aware Meta-Alignment of LLMs via Non-Linear GRPO

Add code
Feb 02, 2026
Viaarxiv icon

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Add code
Nov 12, 2025
Viaarxiv icon

A Provable Approach for End-to-End Safe Reinforcement Learning

Add code
May 28, 2025
Viaarxiv icon

Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies

Add code
May 22, 2025
Viaarxiv icon

Target Return Optimizer for Multi-Game Decision Transformer

Add code
Mar 04, 2025
Viaarxiv icon