UID:
almafu_9960982130402883
Format:
1 online resource (xvi, 358 pages) :
,
digital, PDF file(s).
Edition:
First edition.
ISBN:
9781108623551
,
1108623557
,
9781108348973
,
1108348971
Content:
Bayesian optimization is a methodology for optimizing expensive objective functions that has proven success in the sciences, engineering, and beyond. This timely text provides a self-contained and comprehensive introduction to the subject, starting from scratch and carefully developing all the key ideas along the way. This bottom-up approach illuminates unifying themes in the design of Bayesian optimization algorithms and builds a solid theoretical foundation for approaching novel situations. The core of the book is divided into three main parts, covering theoretical and practical aspects of Gaussian process modeling, the Bayesian approach to sequential decision making, and the realization and computation of practical and effective optimization policies. Following this foundational material, the book provides an overview of theoretical convergence results, a survey of notable extensions, a comprehensive history of Bayesian optimization, and an extensive annotated bibliography of applications.
Note:
Title from publisher's bibliographic system (viewed on 25 Jan 2023).
,
Cover -- Half-title page -- Title page -- Copyright page -- Contents -- Preface -- Notation -- Symbols with Implicit Dependence on Location -- Comprehensive List of Symbols -- 1 Introduction -- 1.1 Formalization of Optimization -- Optimization policy -- Observation model -- Termination -- 1.2 The Bayesian Approach -- Bayesian inference -- Bayesian inference of the objective function -- Uncertainty-aware optimization policies -- 2 Gaussian Processes -- 2.1 Definition and Basic Properties -- Definition -- Example and basic properties -- Sampling -- 2.2 Inference with Exact and Noisy Observations -- Inference from arbitrary jointly Gaussian observations -- Corruption by additive Gaussian noise -- Interpretation of posterior moments -- Inference with exact function evaluations -- Inference with function evaluations corrupted by additive Gaussian noise -- 2.3 Overview of Remainder of Chapter -- 2.4 Joint Gaussian Processes -- Definition -- Example -- Inference for joint Gaussian processes -- 2.5 Continuity -- 2.6 Differentiability -- Conditioning on derivative observations -- Other linear transformations -- 2.7 Existence and Uniqueness of Global Maxima -- Existence of global maxima -- Uniqueness of global maxima -- Counterexamples -- 2.8 Inference with Non-Gaussian Observations and Constraints -- Non-Gaussian observations: general case -- Monte Carlo sampling -- Gaussian approximate inference -- 2.9 Summary of Major Ideas -- 3 Modeling with Gaussian Processes -- 3.1 The Prior Mean Function -- Constant mean function -- Linear combination of basis functions -- Other options -- 3.2 The Prior Covariance Function -- Stationarity, isotropy, and Bochner's theorem -- 3.3 Notable Covariance Functions -- The Matérn family -- The spectral mixture covariance -- Linear covariance function -- 3.4 Modifying and Combining Covariance Functions.
,
Scaling function outputs -- Transforming the domain and length scale parameters -- Nonlinear warping -- Combining covariance functions -- 3.5 Modeling Functions on High-Dimensional Domains -- Neural embeddings -- Linear embeddings -- 3.6 Summary of Major Ideas -- 4 Model Assessment, Selection, and Averaging -- 4.1 Models and Model Structures -- Spaces of candidate models -- Example -- 4.2 Bayesian Inference over Parametric Model Spaces -- Model prior -- Model posterior -- Marginal likelihood and Bayesian Occam's razor -- Return to example -- 4.3 Model Selection via Posterior Maximization -- 4.4 Model Averaging -- Monte Carlo approximation -- Deterministic approximation schemes -- 4.5 Multiple Model Structures -- Multiple structure example -- 4.6 Automating Model Structure Search -- Spaces of candidate model structures -- Searching over model structures -- End-to-end automation -- 4.7 Summary of Major Ideas -- 5 Decision Theory for Optimization -- 5.1 Introduction to Bayesian Decision Theory -- Isolated decisions -- Example: recommending a point for use after optimization -- 5.2 Sequential Decisions with a Fixed Budget -- Modeling policy decisions -- Uncertainty faced during optimization -- Fixed budget: one observation remaining -- Fixed budget: two observations remaining -- Fixed budget: inductive case -- Bellman optimality and the Bellman equation -- 5.3 Cost and Approximation of the Optimal Policy -- Limited lookahead -- Rollout -- 5.4 Cost-Aware Optimization and Termination as a Decision -- Modeling termination decisions and the optimal policy -- Example: cost-aware optimization -- Demonstration: one-step lookahead with cost-aware utility -- 5.5 Summary of Major Ideas -- 6 Utility Functions for Optimization -- 6.1 Expected Utility of Terminal Recommendation -- Formalization of terminal recommendation decision -- Choosing an action space.
,
Choosing a utility function and risk tolerance -- Simple reward -- Global reward -- A tempting, but nonsensical alternative -- 6.2 Cumulative Reward -- 6.3 Information Gain -- 6.4 Dependence on Model of Objective Function -- 6.5 Comparison of Utility Functions -- Good global outcome but poor local outcome -- Good local outcome but poor global outcome -- 6.6 Summary of Major Ideas -- 7 Common Bayesian Optimization Policies -- 7.1 Example Optimization Scenario -- 7.2 Decision-Theoretic Policies -- One-step lookahead -- 7.3 Expected Improvement -- 7.4 Knowledge Gradient -- 7.5 Probability of Improvement -- The role of the improvement target -- 7.6 Mutual Information and Entropy Search -- Mutual information -- Maximizing mutual information as an optimization policy -- Mutual information with [sup(*)] -- Mutual information with [sup(*)] -- 7.7 Multi-Armed Bandits and Optimization -- The Bayesian optimal policy -- Optimization as an infinite-armed bandit -- 7.8 Maximizing a Statistical Upper Bound -- 7.9 Thompson Sampling -- 7.10 Other Ideas in Policy Construction -- Approximate dynamic programming beyond one-step lookahead -- Artificially encouraging exploration -- Lookahead for cumulative reward? -- 7.11 Summary of Major Ideas -- 8 Computing Policies with Gaussian Processes -- 8.1 Notation for Objective Function Model -- 8.2 Expected Improvement -- Expected improvement without noise -- Expected improvement with noise -- Alternative formulations of noisy expected improvement -- 8.3 Probability of Improvement -- Probability of improvement without noise -- Probability of improvement with noise -- 8.4 Upper Confidence Bound -- Correspondence with probability of improvement -- 8.5 Approximate Computation for One-Step Lookahead -- 8.6 Knowledge Gradient -- Exact computation in discrete domains -- Approximation via numerical quadrature.
,
Knowledge gradient for continuous parameters approximation -- 8.7 Thompson Sampling -- Exhaustive sampling -- On-demand sampling -- Sparse spectrum approximation for stationary covariance functions -- 8.8 Mutual Information with [sup(*)] -- Direct form of mutual information -- Predictive form of mutual information -- Approximating expectation with respect to [sup(*)] -- Gaussian approximation to conditional predictive distribution -- Approximation via Gaussian expectation propagation -- Ensuring [sup(*)] is a local optimum -- Ensuring [sup(*)] is a global optimum -- Efficient implementation -- 8.9 Mutual Information with [sup(*)] -- Approximating expectation with respect to [sup(*)] -- Approximating quantiles of [sup(*)] -- Predictive entropy with exact observations -- Approximating predictive entropy with noisy observations -- 8.10 Averaging over a Space of Gaussian Processes -- Noiseless expected improvement and probability of improvement -- Model-dependent utility functions -- Example and discussion -- Computing expected gain in marginal utility -- Upper confidence bound and Thompson sampling -- 8.11 Alternative Models: Bayesian Neural Networks, etc. -- Random forests -- Density ratio estimation -- Bayesian neural networks -- 8.12 Summary of Major Ideas -- 9 Implementation -- 9.1 Gaussian Process Inference, Scaling, and Approximation -- Direct computation via Cholesky decomposition -- Low-rank updates to the Cholesky factorization for sequential inference -- Ill-conditioned covariance matrices -- Iterative numerical methods -- Sparse approximations -- 9.2 Optimizing Acquisition Functions -- Optimization approaches -- Optimization in latent spaces -- Optimization on combinatorial domains -- The benefit of imperfect optimization? -- 9.3 Starting and Stopping Optimization -- Initialization -- Termination -- 9.4 Summary of Major Ideas.
,
10 Theoretical Analysis -- 10.1 Regret -- Simple regret -- Cumulative regret -- Relationship between simple and cumulative regret -- 10.2 Useful Function Spaces for Studying Convergence -- Convergence on all continuous functions -- Convergence results for all continuous functions on the unit interval -- Convergence on "nice" continuous functions -- Sample paths of a Gaussian process -- The worst-case alternative -- Reproducing kernel Hilbert spaces -- Reproducing kernel Hilbert space norm -- 10.3 Relevant Properties of Covariance Functions -- Smoothness of sample paths -- Information capacity -- Known bounds on information capacity -- Bounding the sum of predictive variances -- 10.4 Bayesian Regret with Observation Noise -- Common assumptions -- Upper confidence bound -- Regret bound on finite domains -- Extending to continuous domains -- The Lipschitz tail bound condition -- Bounding the expected regret -- Thompson sampling -- Upper bounds on simple regret -- Lower bounds and tightness of existing algorithms -- 10.5 Worst-Case Regret with Observation Noise -- Common assumptions -- Upper confidence bound and Thompson sampling -- Simple regret bounds in the nonadaptive setting -- Closing the cumulative regret bound gap -- Lower bounds and tightness of existing algorithms -- 10.6 The Exact Observation Case -- Bounding the posterior standard deviation -- Optimizing fill distance -- Worst-case regret with deterministic observations -- Bayesian regret with deterministic observations -- 10.7 The Effect of Unknown Hyperparameters -- 10.8 Summary of Major Ideas -- 11 Extensions and Related Settings -- 11.1 Unknown Observation Costs -- Inference for unknown cost function -- Decision making with unknown costs -- Expected gain per unit cost -- 11.2 Constrained Optimization and Unknown Constraints -- Modeling constraint functions -- Defining a utility function.
,
Deriving a policy.
Additional Edition:
ISBN 9781108425780
Language:
English
Bookmarklink