Proximal Policy Optimization (PPO)

Paper

Proximal Policy Optimization Algorithms [1]

Framework(s)

../_images/pytorch.png

PyTorch

../_images/tf.png

TensorFlow

API Reference

garage.torch.algos.PPO

garage.tf.algos.PPO

Code

garage/torch/algos/ppo.py

garage/tf/algos/ppo.py

Examples

examples

Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent.

Garage’s implementation also supports adding entropy bonus to the objective. Two types of entropy approaches could be used here. Maximum entropy approach adds the dense entropy to the reward for each time step, while entropy regularization adds the mean entropy to the surrogate objective. See [2] for more details.

Examples

Garage has implementations of PPO with PyTorch and TensorFlow.

PyTorch

TensorFlow

References

1

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

2

Sergey Levine. Reinforcement learning and control as probabilistic inference: tutorial and review. arXiv preprint arXiv:1805.00909, 2018.


This page was authored by Ruofu Wang (@yeukfu).