Skip to main navigation Skip to search Skip to main content

Distributed Multi-Armed Bandit over Arbitrary Undirected Graphs

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

This paper studies a distributed multi-armed bandit problem in a network of multiple agents, each of which can communicate only with its neighbors, where neighbor relationships are described by an undirected graph. Each agent makes a sequence of decisions on selecting an arm from a given set of candidates, yet it only has access to local samples of the reward for each action, which is an unknown random variable. All the agents share the same distribution of each arm's reward. A distributed upper confidence bound (UCB) algorithm is proposed for the agents to cooperatively learn the best arm, which does not require any global information. It is shown that the algorithm achieves a logarithmic regret for each of the agents, even though the graph is disconnected. The derived regret implies that the proposed distributed UCB algorithm enables a faster learning for any agent in the network compared with the classical single-agent UCB algorithm, as long as the agent has at least one neighbor.

Original languageEnglish
Title of host publication60th IEEE Conference on Decision and Control, CDC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6976-6981
Number of pages6
ISBN (Electronic)9781665436595
DOIs
StatePublished - 2021
Event60th IEEE Conference on Decision and Control, CDC 2021 - Austin, United States
Duration: Dec 13 2021Dec 17 2021

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2021-December

Conference

Conference60th IEEE Conference on Decision and Control, CDC 2021
Country/TerritoryUnited States
CityAustin
Period12/13/2112/17/21

Fingerprint

Dive into the research topics of 'Distributed Multi-Armed Bandit over Arbitrary Undirected Graphs'. Together they form a unique fingerprint.

Cite this