Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Online Resource
    Online Resource
    Cham, Switzerland :Springer,
    UID:
    almafu_9961000629602883
    Format: 1 online resource (137 pages)
    ISBN: 9783031190674
    Series Statement: Synthesis Lectures on Learning, Networks, and Algorithms
    Note: Intro -- Preface -- Contents -- Acronyms and Symbols -- 1 Distributed Optimization in Machine Learning -- [DELETE] -- 1.1 SGD in Supervised Machine Learning -- 1.1.1 Training Data and Hypothesis -- 1.1.2 Empirical Risk Minimization -- 1.1.3 Gradient Descent -- 1.1.4 Stochastic Gradient Descent -- 1.1.5 Mini-batch SGD -- 1.1.6 Linear Regression -- 1.1.7 Logistic Regression -- 1.1.8 Neural Networks -- 1.2 Distributed Stochastic Gradient Descent -- 1.2.1 The Parameter Server Framework -- 1.2.2 The System-Aware Design Philosophy -- 1.3 Scalable Distributed SGD Algorithms -- 1.3.1 Straggler-Resilient and Asynchronous SGD -- 1.3.2 Communication-Efficient Distributed SGD -- 1.3.3 Decentralized SGD -- 2 Calculus, Probability and Order Statistics Review -- [DELETE] -- 2.1 Calculus and Linear Algebra -- 2.1.1 Norms and Inner Products -- 2.1.2 Lipschitz Continuity and Smoothness -- 2.1.3 Strong Convexity -- 2.2 Probability Review -- 2.2.1 Random Variable -- 2.2.2 Expectation and Variance -- 2.2.3 Some Canonical Random Variables -- 2.2.4 Bayes Rule and Conditional Probability -- 2.3 Order Statistics -- 2.3.1 Order Statistics of the Exponential Distribution -- 2.3.2 Order Statistics of the Uniform Distribution -- 2.3.3 Asymptotic Distribution of Quantiles -- 3 Convergence of SGD and Variance-Reduced Variants -- [DELETE] -- 3.1 Gradient Descent (GD) Convergence -- 3.1.1 Effect of Learning Rate and Other Parameters -- 3.1.2 Iteration Complexity -- 3.2 Convergence Analysis of Mini-batch SGD -- 3.2.1 Effect of Learning Rate and Mini-batch Size -- 3.2.2 Iteration Complexity -- 3.2.3 Non-convex Objectives -- 3.3 Variance-Reduced SGD Variants -- 3.3.1 Dynamic Mini-batch Size Schedule -- 3.3.2 Stochastic Average Gradient (SAG) -- 3.3.3 Stochastic Variance Reduced Gradient (SVRG) -- 4 Synchronous SGD and Straggler-Resilient Variants -- 4.1 Parameter Server Framework. , 4.2 Distributed Synchronous SGD Algorithm -- 4.3 Convergence Analysis -- 4.3.1 Iteration Complexity -- 4.4 Runtime per Iteration -- 4.4.1 Gradient Computation and Communication Time -- 4.4.2 Expected Runtime per Iteration -- 4.4.3 Error Versus Runtime Convergence -- 4.5 Straggler-Resilient Variants -- 4.5.1 K-Synchronous SGD -- 4.5.2 K-Batch-Synchronous SGD -- 5 Asynchronous SGD and Staleness-Reduced Variants -- 5.1 The Asynchronous SGD Algorithm -- 5.1.1 Comparison with Synchronous SGD -- 5.2 Runtime Analysis -- 5.2.1 Runtime Speed-Up Compared to Synchronous SGD -- 5.3 Convergence Analysis -- 5.3.1 Implications of the Asynchronous SGD Convergence Bound -- 5.4 Staleness-Reduced Variants of Asynchronous SGD -- 5.4.1 K-Asynchronous SGD -- 5.4.2 K-Batch-Asynchronous SGD -- 5.5 Adaptive Methods to Improve the Error-Runtime Trade-Off -- 5.5.1 Adaptive Synchronization -- 5.5.2 Adaptive Learning Rate Schedule to Compensate Staleness -- 5.6 HogWild and Lock-Free Parallelism -- 6 Local-Update and Overlap SGD -- 6.1 Local-Update SGD Algorithm -- 6.1.1 Convergence Analysis -- 6.1.2 Runtime Analysis -- 6.1.3 Adaptive Communication -- 6.2 Elastic and Overlap SGD -- 6.2.1 Elastic Averaging SGD -- 6.2.2 Overlap Local SGD -- 7 Quantized and Sparsified Distributed SGD -- [DELETE] -- 7.1 Quantized SGD -- 7.1.1 Uniform Stochastic Quantization -- 7.1.2 Convergence Analysis -- 7.1.3 Runtime Analysis -- 7.1.4 Adaptive Quantization -- 7.2 Sparsified SGD -- 7.2.1 Rand-k Sparsification -- 7.2.2 Top-k Sparsification -- 7.2.3 Rand-k Sparsified Distributed SGD -- 7.2.4 Error Feedback in Sparsified SGD -- 8 Decentralized SGD and Its Variants -- [DELETE] -- 8.1 Network Topology and Graph Notation -- 8.1.1 Adjacency Matrix -- 8.1.2 Laplacian Matrix -- 8.1.3 Mixing Matrix -- 8.2 Decentralized SGD -- 8.2.1 The Algorithm -- 8.2.2 Variants of Decentralized SGD. , 8.3 Error Convergence Analysis -- 8.3.1 Assumptions -- 8.3.2 Convergence Analysis of Decentralized SGD -- 8.3.3 Convergence Analysis of Decentralized Local-Update SGD -- 8.4 Runtime Analysis -- 9 Beyond Distributed Training in the Cloud -- [DELETE].
    Additional Edition: Print version: Joshi, Gauri Optimization Algorithms for Distributed Machine Learning Cham : Springer International Publishing AG,c2023 ISBN 9783031190667
    Language: English
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. Further information can be found on the KOBV privacy pages