Skip to main navigation Skip to search Skip to main content

A weighted aggregating SGD for scalable parallelization in deep learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We investigate the stochastic optimization problem and develop a scalable parallel computing algorithm for deep learning tasks. The key of our study involves a reformation of the objective function for the stochastic optimization in neural network models. We propose a novel update rule, named weighted aggregating stochastic gradient decent, after theoretically analyzing the characteristics of the newly formalized objective function. The new rule introduces a weighted aggregation scheme based on the performance of local workers and does not require a center variable. It assesses the relative importance of local workers and accepts them according to their contributions. Our new rule also allows the implementation of both synchronous and asynchronous parallelization and can result in varying convergence rates. For method evaluation, we benchmark our schemes against the mainstream algorithms, including the elastic averaging SGD in training deep neural networks for classification tasks. We conduct extensive experiments on several classic datasets, and the results confirm the strength of our scheme in accelerating the training of deep architecture and scalable parallelization.

Original languageEnglish
Title of host publicationProceedings - 19th IEEE International Conference on Data Mining, ICDM 2019
EditorsJianyong Wang, Kyuseok Shim, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1072-1077
Number of pages6
ISBN (Electronic)9781728146034
DOIs
StatePublished - Nov 2019
Event19th IEEE International Conference on Data Mining, ICDM 2019 - Beijing, China
Duration: Nov 8 2019Nov 11 2019

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2019-November

Conference

Conference19th IEEE International Conference on Data Mining, ICDM 2019
Country/TerritoryChina
CityBeijing
Period11/8/1911/11/19

Keywords

  • Deep learning
  • Parallel Computing
  • Stochastic gradient descent

Fingerprint

Dive into the research topics of 'A weighted aggregating SGD for scalable parallelization in deep learning'. Together they form a unique fingerprint.

Cite this