Learning the Pareto Front Using Bootstrapped Observation Samples
Title: | Learning the Pareto Front Using Bootstrapped Observation Samples |
---|---|
Authors: | Kim, Wonyoung, Iyengar, Garud, Zeevi, Assaf |
Publication Year: | 2023 |
Collection: | Computer Science Statistics |
Subject Terms: | Statistics - Machine Learning, Computer Science - Machine Learning |
More Details: | We consider Pareto front identification (PFI) for linear bandits (PFILin), i.e., the goal is to identify a set of arms with undominated mean reward vectors when the mean reward vector is a linear function of the context. PFILin includes the best arm identification problem and multi-objective active learning as special cases. The sample complexity of our proposed algorithm is optimal up to a logarithmic factor. In addition, the regret incurred by our algorithm during the estimation is within a logarithmic factor of the optimal regret among all algorithms that identify the Pareto front. Our key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions -- in contrast to the conventional estimator that only updates the parameter estimate along the chosen context. This allows us to use low-regret arms to collect information about Pareto optimal arms. Our key innovation is to reuse the exploration samples multiple times; in contrast to conventional estimators that use each sample only once. Numerical experiments demonstrate that the proposed algorithm successfully identifies the Pareto front while controlling the regret. Comment: 37 pages including appendix |
Document Type: | Working Paper |
Access URL: | http://arxiv.org/abs/2306.00096 |
Accession Number: | edsarx.2306.00096 |
Database: | arXiv |
Description not available. |