Picture for Daniil Tiapkin

Daniil Tiapkin

CMAP, LMO

Learning Shortest Paths with Generative Flow Networks

Add code
Mar 02, 2026
Viaarxiv icon

Beyond Softmax and Entropy: Improving Convergence Guarantees of Policy Gradients by f-SoftArgmax Parameterization with Coupled Regularization

Add code
Jan 18, 2026
Viaarxiv icon

On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment

Add code
May 29, 2025
Figure 1 for On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
Figure 2 for On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
Figure 3 for On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
Figure 4 for On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
Viaarxiv icon

Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games

Add code
May 28, 2025
Viaarxiv icon

Accelerating Nash Learning from Human Feedback via Mirror Prox

Add code
May 26, 2025
Viaarxiv icon

Revisiting Non-Acyclic GFlowNets in Discrete Environments

Add code
Feb 11, 2025
Figure 1 for Revisiting Non-Acyclic GFlowNets in Discrete Environments
Figure 2 for Revisiting Non-Acyclic GFlowNets in Discrete Environments
Figure 3 for Revisiting Non-Acyclic GFlowNets in Discrete Environments
Figure 4 for Revisiting Non-Acyclic GFlowNets in Discrete Environments
Viaarxiv icon

On Teacher Hacking in Language Model Distillation

Add code
Feb 04, 2025
Figure 1 for On Teacher Hacking in Language Model Distillation
Figure 2 for On Teacher Hacking in Language Model Distillation
Figure 3 for On Teacher Hacking in Language Model Distillation
Figure 4 for On Teacher Hacking in Language Model Distillation
Viaarxiv icon

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

Add code
Oct 30, 2024
Figure 1 for Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
Figure 2 for Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
Figure 3 for Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
Figure 4 for Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
Viaarxiv icon

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization

Add code
Oct 20, 2024
Figure 1 for Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Figure 2 for Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Figure 3 for Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Figure 4 for Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Viaarxiv icon

Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

Add code
Jul 08, 2024
Viaarxiv icon