Paper Summary: Scaling Distributed Machine Learning with the Parameter Server

Last updated:

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.


Open-source blueprint (and canonical implementation) for a distributed, message-based server architecture for model-agnostic machine learning.

The proposed architecture focuses on the training of models, no mention is made about inference.


Because (as of the time of print, i.e. 2014) authors claim no other open source framework supported distributed training of ML algorithms at the scale they require (order of hundreds of Terabytes to Petabytes).


Provides a blueprint for organizing a cluster of instances that operate as a machine learning with features such as:

  • Distributed training/optimization (with SGD)

  • Updating trained models with more data

Provides asynchronous primitives for communicating parameters (like gradients during SGD algorithm) across servers in the clusters.


  • An implementation of the Parameter Server has (as of 2014) outperformed other similar distributed systems for training algorithms such as regularized Logistic Regression and LDA on large datasets, w.r.t. time taken for training.


It doesn't look like Tensorflow was available at the time this paper was written. In fact, the authors mention DistBelief, which is the precursor for TensorFlow.

MY 2¢

  • As from the paper itself, it seems that the training setup requires users to be able to write distributed algorithms for each machine learning strategy; there's no clear indication that there are some premade "building blocks" users can build upon.

  • It's not very clear how generalizable this setup is for other algorithms and use cases.


Dialogue & Discussion