Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Philipp Moritz

Hoplite: Efficient Collective Communication for Task-Based Distributed Systems

Feb 13, 2020

Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, Ion Stoica

Figure 1 for Hoplite: Efficient Collective Communication for Task-Based Distributed Systems

Figure 2 for Hoplite: Efficient Collective Communication for Task-Based Distributed Systems

Figure 3 for Hoplite: Efficient Collective Communication for Task-Based Distributed Systems

Figure 4 for Hoplite: Efficient Collective Communication for Task-Based Distributed Systems

Abstract:Collective communication systems such as MPI offer high performance group communication primitives at the cost of application flexibility. Today, an increasing number of distributed applications (e.g, reinforcement learning) require flexibility in expressing dynamic and asynchronous communication patterns. To accommodate these applications, task-based distributed computing frameworks (e.g., Ray, Dask, Hydro) have become popular as they allow applications to dynamically specify communication by invoking tasks, or functions, at runtime. This design makes efficient collective communication challenging because (1) the group of communicating processes is chosen at runtime, and (2) processes may not all be ready at the same time. We design and implement Hoplite, a communication layer for task-based distributed systems that achieves high performance collective communication without compromising application flexibility. The key idea of Hoplite is to use distributed protocols to compute a data transfer schedule on the fly. This enables the same optimizations used in traditional collective communication, but for applications that specify the communication incrementally. We show that Hoplite can achieve similar performance compared with a traditional collective communication library, MPICH. We port a popular distributed computing framework, Ray, on atop of Hoplite. We show that Hoplite can speed up asynchronous parameter server and distributed reinforcement learning workloads that are difficult to execute efficiently with traditional collective communication by up to 8.1x and 3.9x, respectively.

Via

Access Paper or Ask Questions

Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

Apr 07, 2019

Thomas Anthony, Robert Nishihara, Philipp Moritz, Tim Salimans, John Schulman

Figure 1 for Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

Figure 2 for Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

Figure 3 for Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

Abstract:Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with Monte Carlo averages, stored in a search tree; this does not scale to games with very high branching factors. We propose an alternative simulation-based search method, Policy Gradient Search (PGS), which adapts a neural network simulation policy online via policy gradient updates, avoiding the need for a search tree. In Hex, PGS achieves comparable performance to MCTS, and an agent trained using Expert Iteration with PGS was able defeat MoHex 2.0, the strongest open-source Hex agent, in 9x9 Hex.

Via

Access Paper or Ask Questions

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Oct 20, 2018

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

Figure 1 for High-Dimensional Continuous Control Using Generalized Advantage Estimation

Figure 2 for High-Dimensional Continuous Control Using Generalized Advantage Estimation

Figure 3 for High-Dimensional Continuous Control Using Generalized Advantage Estimation

Figure 4 for High-Dimensional Continuous Control Using Generalized Advantage Estimation

Abstract:Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(lambda). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks. Our approach yields strong empirical results on highly challenging 3D locomotion tasks, learning running gaits for bipedal and quadrupedal simulated robots, and learning a policy for getting the biped to stand up from starting out lying on the ground. In contrast to a body of prior work that uses hand-crafted policy representations, our neural network policies map directly from raw kinematics to joint torques. Our algorithm is fully model-free, and the amount of simulated experience required for the learning tasks on 3D bipeds corresponds to 1-2 weeks of real time.

Via

Access Paper or Ask Questions

Ray: A Distributed Framework for Emerging AI Applications

Sep 30, 2018

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan(+1 more)

Figure 1 for Ray: A Distributed Framework for Emerging AI Applications

Figure 2 for Ray: A Distributed Framework for Emerging AI Applications

Figure 3 for Ray: A Distributed Framework for Emerging AI Applications

Figure 4 for Ray: A Distributed Framework for Emerging AI Applications

Abstract:The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray---a distributed system to address them. Ray implements a unified interface that can express both task-parallel and actor-based computations, supported by a single dynamic execution engine. To meet the performance requirements, Ray employs a distributed scheduler and a distributed and fault-tolerant store to manage the system's control state. In our experiments, we demonstrate scaling beyond 1.8 million tasks per second and better performance than existing specialized systems for several challenging reinforcement learning applications.

* 17 pages, 14 figures, 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Via

Access Paper or Ask Questions

Tune: A Research Platform for Distributed Model Selection and Training

Jul 13, 2018

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, Ion Stoica

Figure 1 for Tune: A Research Platform for Distributed Model Selection and Training

Figure 2 for Tune: A Research Platform for Distributed Model Selection and Training

Figure 3 for Tune: A Research Platform for Distributed Model Selection and Training

Abstract:Modern machine learning algorithms are increasingly computationally demanding, requiring specialized hardware and distributed computation to achieve high performance in a reasonable time frame. Many hyperparameter search algorithms have been proposed for improving the efficiency of model selection, however their adaptation to the distributed compute environment is often ad-hoc. We propose Tune, a unified framework for model selection and training that provides a narrow-waist interface between training scripts and search algorithms. We show that this interface meets the requirements for a broad range of hyperparameter search algorithms, allows straightforward scaling of search to large clusters, and simplifies algorithm implementation. We demonstrate the implementation of several state-of-the-art hyperparameter search algorithms in Tune. Tune is available at http://ray.readthedocs.io/en/latest/tune.html.

* 8 Pages, Presented at the 2018 ICML AutoML workshop

Via

Access Paper or Ask Questions

RLlib: Abstractions for Distributed Reinforcement Learning

Jun 29, 2018

Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica

Figure 1 for RLlib: Abstractions for Distributed Reinforcement Learning

Figure 2 for RLlib: Abstractions for Distributed Reinforcement Learning

Figure 3 for RLlib: Abstractions for Distributed Reinforcement Learning

Figure 4 for RLlib: Abstractions for Distributed Reinforcement Learning

Abstract:Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available at https://rllib.io/.

* Published in the International Conference on Machine Learning (ICML 2018), 10 pages

Via

Access Paper or Ask Questions

Real-Time Machine Learning: The Missing Pieces

May 19, 2017

Robert Nishihara, Philipp Moritz, Stephanie Wang, Alexey Tumanov, William Paul, Johann Schleier-Smith, Richard Liaw, Mehrdad Niknami, Michael I. Jordan, Ion Stoica

Figure 1 for Real-Time Machine Learning: The Missing Pieces

Figure 2 for Real-Time Machine Learning: The Missing Pieces

Figure 3 for Real-Time Machine Learning: The Missing Pieces

Abstract:Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, none of which are difficult to achieve in isolation, but the combination of which creates a challenge for existing distributed execution frameworks: computation with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application.

* 6 pages, 3 figures

Via

Access Paper or Ask Questions

Trust Region Policy Optimization

Apr 20, 2017

John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel

Figure 1 for Trust Region Policy Optimization

Figure 2 for Trust Region Policy Optimization

Abstract:We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

* 16 pages, ICML 2015

Via

Access Paper or Ask Questions

A Linearly-Convergent Stochastic L-BFGS Algorithm

Apr 13, 2016

Philipp Moritz, Robert Nishihara, Michael I. Jordan

Figure 1 for A Linearly-Convergent Stochastic L-BFGS Algorithm

Figure 2 for A Linearly-Convergent Stochastic L-BFGS Algorithm

Figure 3 for A Linearly-Convergent Stochastic L-BFGS Algorithm

Abstract:We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude.

* 10 pages, 3 figures in International Conference on Artificial Intelligence and Statistics, 2016

Via

Access Paper or Ask Questions

SparkNet: Training Deep Networks in Spark

Feb 28, 2016

Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan

Figure 1 for SparkNet: Training Deep Networks in Spark

Figure 2 for SparkNet: Training Deep Networks in Spark

Figure 3 for SparkNet: Training Deep Networks in Spark

Figure 4 for SparkNet: Training Deep Networks in Spark

Abstract:Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this reason, leveraging the resources of a cluster to speed up training is an important area of work. However, widely-popular batch-processing computational frameworks like MapReduce and Spark were not designed to support the asynchronous and communication-intensive workloads of existing distributed deep learning systems. We introduce SparkNet, a framework for training deep networks in Spark. Our implementation includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework, and a lightweight multi-dimensional tensor library. Using a simple parallelization scheme for stochastic gradient descent, SparkNet scales well with the cluster size and tolerates very high-latency communication. Furthermore, it is easy to deploy and use with no parameter tuning, and it is compatible with existing Caffe models. We quantify the dependence of the speedup obtained by SparkNet on the number of machines, the communication frequency, and the cluster's communication overhead, and we benchmark our system's performance on the ImageNet dataset.

* 12 pages, 7 figures

Via

Access Paper or Ask Questions