TY - GEN
T1 - SNAP
T2 - 40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020
AU - Zhao, Yangming
AU - Fan, Jingyuan
AU - Su, Lu
AU - Song, Tongyu
AU - Wang, Sheng
AU - Qiao, Chunming
N1 - Publisher Copyright: ©2020 IEEE
PY - 2020/11
Y1 - 2020/11
N2 - —More and more applications learn from the data collected by the edge devices. Conventional learning methods, such as gathering all the raw data to train an ultimate model in a centralized way, or training a target model in a distributed manner under the parameter server framework, suffer a high communication cost. In this paper, we design Select Neighbors and Parameters (SNAP), a communication efficient distributed machine learning framework, to mitigate the communication cost. A distinct feature of SNAP is that the edge servers act as peers to each other. Specifically, in SNAP, every edge server hosts a copy of the global model, trains it with the local data, and periodically updates the local parameters based on the weighted sum of the parameters from its neighbors (i.e., peers) only (i.e., without pulling the parameters from all other edge servers). Different from most of the previous works on consensus optimization in which the weight matrix to update parameter values is predefined, we propose a scheme to optimize the weight matrix based on the network topology, and hence the convergence rate can be improved. Another key idea in SNAP is that only the parameters which have been changed significantly since the last iteration will be sent to the neighbors. Both theoretical analysis and simulations show that SNAP can achieve the same accuracy performance as the centralized training method. Compared to the state-of-the-art communication-aware distributed learning scheme TernGrad, SNAP incurs a significantly lower (99.6% lower) communication cost.
AB - —More and more applications learn from the data collected by the edge devices. Conventional learning methods, such as gathering all the raw data to train an ultimate model in a centralized way, or training a target model in a distributed manner under the parameter server framework, suffer a high communication cost. In this paper, we design Select Neighbors and Parameters (SNAP), a communication efficient distributed machine learning framework, to mitigate the communication cost. A distinct feature of SNAP is that the edge servers act as peers to each other. Specifically, in SNAP, every edge server hosts a copy of the global model, trains it with the local data, and periodically updates the local parameters based on the weighted sum of the parameters from its neighbors (i.e., peers) only (i.e., without pulling the parameters from all other edge servers). Different from most of the previous works on consensus optimization in which the weight matrix to update parameter values is predefined, we propose a scheme to optimize the weight matrix based on the network topology, and hence the convergence rate can be improved. Another key idea in SNAP is that only the parameters which have been changed significantly since the last iteration will be sent to the neighbors. Both theoretical analysis and simulations show that SNAP can achieve the same accuracy performance as the centralized training method. Compared to the state-of-the-art communication-aware distributed learning scheme TernGrad, SNAP incurs a significantly lower (99.6% lower) communication cost.
UR - https://www.scopus.com/pages/publications/85101995204
U2 - 10.1109/ICDCS47774.2020.00072
DO - 10.1109/ICDCS47774.2020.00072
M3 - Conference contribution
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 584
EP - 594
BT - Proceedings - 2020 IEEE 40th International Conference on Distributed Computing Systems, ICDCS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 29 November 2020 through 1 December 2020
ER -