Distributed Deep Learning Network over Spark

Distributed Deep Learning Network over Spark is becoming an important AI paradigm for pattern recognition, image/video processing and fraud detection applications. Objective is to parallelize the training phase.

Introduction – Geoffrey Hinton presented the paradigm for fast learning in a deep belief network [Hinton 2006]. This paper, led to the breakthrough in this field. Consequently, every big software technology company is working on deep learning. A number of applications are being realized over it, including in various fields such as credit card fraud detection (Deep Learning Analytics from Fico), multi-modal information processing etc. The team at Google lead by Geoffrey Dean came up with the first implementation of distributed deep learning [Dean 2012]. Oxdata has recently released its H20 software which also comprises a deep learning network in addition to several other machine learning algorithms. They have also made H20 to work over Spark, as said on Sparkling Water. Currently Microsoft project Adam comes close to a fully distributed realization of a deep learning network. Currently, there is no deep learning implementation either in MLLib, the machine learning library on top of Spark or outside of MLLib.

References [Dean 2012]  Dean, Jeffrey, et al. “Large scale distributed deep networks.” Advances in Neural Information Processing Systems. 2012. [Deng 2013] Li Deng and Long Yu, Deep Learning : Methods and Applications, Foundations and Trends in Signal Processing, vol 7, no. 3, pages 197-387, 2013. [Hinton 2006] Hinton, G. E., Osindero, S. and Teh, Y. A fast learning algorithm for deep belief nets, Neural Computation, 18, pages 1527-1554. [Roux 2008] Le Roux, Nicolas, and Yoshua Bengio. “Representational power of restricted boltzmann machines and deep belief networks.” Neural Computation 20.6 (2008): 1631-1649.