Picture for Alexander Bukharin

Alexander Bukharin

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Add code
Aug 21, 2025
Viaarxiv icon

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Add code
May 16, 2025
Viaarxiv icon

Llama-Nemotron: Efficient Reasoning Models

Add code
May 02, 2025
Viaarxiv icon

Adversarial Training of Reward Models

Add code
Apr 08, 2025
Viaarxiv icon

HelpSteer2-Preference: Complementing Ratings with Preferences

Add code
Oct 02, 2024
Viaarxiv icon

Robust Reinforcement Learning from Corrupted Human Feedback

Add code
Jun 21, 2024
Viaarxiv icon

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

Add code
Jun 04, 2024
Viaarxiv icon

Data Diversity Matters for Robust Instruction Tuning

Add code
Nov 21, 2023
Viaarxiv icon

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Add code
Oct 16, 2023
Viaarxiv icon

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

Add code
Sep 06, 2023
Viaarxiv icon