Picture for Paavo Parmas

Paavo Parmas

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Add code
Sep 02, 2024
Viaarxiv icon

A unified view of likelihood ratio and reparameterization gradients

Add code
May 31, 2021
Figure 1 for A unified view of likelihood ratio and reparameterization gradients
Figure 2 for A unified view of likelihood ratio and reparameterization gradients
Figure 3 for A unified view of likelihood ratio and reparameterization gradients
Figure 4 for A unified view of likelihood ratio and reparameterization gradients
Viaarxiv icon

A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Add code
Oct 14, 2019
Figure 1 for A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Figure 2 for A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Figure 3 for A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Figure 4 for A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Viaarxiv icon

Total stochastic gradient algorithms and applications in reinforcement learning

Add code
Feb 05, 2019
Figure 1 for Total stochastic gradient algorithms and applications in reinforcement learning
Figure 2 for Total stochastic gradient algorithms and applications in reinforcement learning
Figure 3 for Total stochastic gradient algorithms and applications in reinforcement learning
Figure 4 for Total stochastic gradient algorithms and applications in reinforcement learning
Viaarxiv icon

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

Add code
Feb 04, 2019
Figure 1 for PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos
Figure 2 for PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos
Figure 3 for PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos
Figure 4 for PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos
Viaarxiv icon