Skip to main navigation Skip to search Skip to main content

Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Convariates

Research output: Contribution to journalConference articlepeer-review

22 Scopus citations

Abstract

In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in the sample size T, O(log T ), and further attains a tighter bound in both the covariates dimension d and the number of significant covariates s, O(s2 (s + log d)). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.

Original languageEnglish
Pages (from-to)5200-5208
Number of pages9
JournalProceedings of Machine Learning Research
Volume80
StatePublished - 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Fingerprint

Dive into the research topics of 'Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Convariates'. Together they form a unique fingerprint.

Cite this