TY - GEN
T1 - Distributed Multi-Armed Bandit over Arbitrary Undirected Graphs
AU - Zhu, Jingxuan
AU - Liu, Ji
N1 - Publisher Copyright: © 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - This paper studies a distributed multi-armed bandit problem in a network of multiple agents, each of which can communicate only with its neighbors, where neighbor relationships are described by an undirected graph. Each agent makes a sequence of decisions on selecting an arm from a given set of candidates, yet it only has access to local samples of the reward for each action, which is an unknown random variable. All the agents share the same distribution of each arm's reward. A distributed upper confidence bound (UCB) algorithm is proposed for the agents to cooperatively learn the best arm, which does not require any global information. It is shown that the algorithm achieves a logarithmic regret for each of the agents, even though the graph is disconnected. The derived regret implies that the proposed distributed UCB algorithm enables a faster learning for any agent in the network compared with the classical single-agent UCB algorithm, as long as the agent has at least one neighbor.
AB - This paper studies a distributed multi-armed bandit problem in a network of multiple agents, each of which can communicate only with its neighbors, where neighbor relationships are described by an undirected graph. Each agent makes a sequence of decisions on selecting an arm from a given set of candidates, yet it only has access to local samples of the reward for each action, which is an unknown random variable. All the agents share the same distribution of each arm's reward. A distributed upper confidence bound (UCB) algorithm is proposed for the agents to cooperatively learn the best arm, which does not require any global information. It is shown that the algorithm achieves a logarithmic regret for each of the agents, even though the graph is disconnected. The derived regret implies that the proposed distributed UCB algorithm enables a faster learning for any agent in the network compared with the classical single-agent UCB algorithm, as long as the agent has at least one neighbor.
UR - https://www.scopus.com/pages/publications/85111920148
U2 - 10.1109/CDC45484.2021.9683253
DO - 10.1109/CDC45484.2021.9683253
M3 - Conference contribution
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 6976
EP - 6981
BT - 60th IEEE Conference on Decision and Control, CDC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 60th IEEE Conference on Decision and Control, CDC 2021
Y2 - 13 December 2021 through 17 December 2021
ER -