Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
Medientyp
Sprache
Region
Bibliothek
Erscheinungszeitraum
  • 1
    Online-Ressource
    Online-Ressource
    Cham :Springer International Publishing :
    UID:
    almahu_9949420054202882
    Umfang: XIII, 127 p. 40 illus., 38 illus. in color. , online resource.
    Ausgabe: 1st ed. 2023.
    ISBN: 9783031190674
    Serie: Synthesis Lectures on Learning, Networks, and Algorithms,
    Inhalt: This book discusses state-of-the-art stochastic optimization algorithms for distributed machine learning and analyzes their convergence speed. The book first introduces stochastic gradient descent (SGD) and its distributed version, synchronous SGD, where the task of computing gradients is divided across several worker nodes. The author discusses several algorithms that improve the scalability and communication efficiency of synchronous SGD, such as asynchronous SGD, local-update SGD, quantized and sparsified SGD, and decentralized SGD. For each of these algorithms, the book analyzes its error versus iterations convergence, and the runtime spent per iteration. The author shows that each of these strategies to reduce communication or synchronization delays encounters a fundamental trade-off between error and runtime.
    Anmerkung: Distributed Optimization in Machine Learning -- Calculus, Probability and Order Statistics Review -- Convergence of SGD and Variance-Reduced Variants -- Synchronous SGD and Straggler-Resilient Variants -- Asynchronous SGD and Staleness-Reduced Variants -- Local-update and Overlap SGD -- Quantized and Sparsified Distributed SGD -- Decentralized SGD and its Variants.
    In: Springer Nature eBook
    Weitere Ausg.: Printed edition: ISBN 9783031190667
    Weitere Ausg.: Printed edition: ISBN 9783031190681
    Weitere Ausg.: Printed edition: ISBN 9783031190698
    Sprache: Englisch
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    Online-Ressource
    Online-Ressource
    Cham, Switzerland :Springer,
    UID:
    edoccha_9961000629602883
    Umfang: 1 online resource (137 pages)
    ISBN: 9783031190674
    Serie: Synthesis Lectures on Learning, Networks, and Algorithms
    Anmerkung: Intro -- Preface -- Contents -- Acronyms and Symbols -- 1 Distributed Optimization in Machine Learning -- [DELETE] -- 1.1 SGD in Supervised Machine Learning -- 1.1.1 Training Data and Hypothesis -- 1.1.2 Empirical Risk Minimization -- 1.1.3 Gradient Descent -- 1.1.4 Stochastic Gradient Descent -- 1.1.5 Mini-batch SGD -- 1.1.6 Linear Regression -- 1.1.7 Logistic Regression -- 1.1.8 Neural Networks -- 1.2 Distributed Stochastic Gradient Descent -- 1.2.1 The Parameter Server Framework -- 1.2.2 The System-Aware Design Philosophy -- 1.3 Scalable Distributed SGD Algorithms -- 1.3.1 Straggler-Resilient and Asynchronous SGD -- 1.3.2 Communication-Efficient Distributed SGD -- 1.3.3 Decentralized SGD -- 2 Calculus, Probability and Order Statistics Review -- [DELETE] -- 2.1 Calculus and Linear Algebra -- 2.1.1 Norms and Inner Products -- 2.1.2 Lipschitz Continuity and Smoothness -- 2.1.3 Strong Convexity -- 2.2 Probability Review -- 2.2.1 Random Variable -- 2.2.2 Expectation and Variance -- 2.2.3 Some Canonical Random Variables -- 2.2.4 Bayes Rule and Conditional Probability -- 2.3 Order Statistics -- 2.3.1 Order Statistics of the Exponential Distribution -- 2.3.2 Order Statistics of the Uniform Distribution -- 2.3.3 Asymptotic Distribution of Quantiles -- 3 Convergence of SGD and Variance-Reduced Variants -- [DELETE] -- 3.1 Gradient Descent (GD) Convergence -- 3.1.1 Effect of Learning Rate and Other Parameters -- 3.1.2 Iteration Complexity -- 3.2 Convergence Analysis of Mini-batch SGD -- 3.2.1 Effect of Learning Rate and Mini-batch Size -- 3.2.2 Iteration Complexity -- 3.2.3 Non-convex Objectives -- 3.3 Variance-Reduced SGD Variants -- 3.3.1 Dynamic Mini-batch Size Schedule -- 3.3.2 Stochastic Average Gradient (SAG) -- 3.3.3 Stochastic Variance Reduced Gradient (SVRG) -- 4 Synchronous SGD and Straggler-Resilient Variants -- 4.1 Parameter Server Framework. , 4.2 Distributed Synchronous SGD Algorithm -- 4.3 Convergence Analysis -- 4.3.1 Iteration Complexity -- 4.4 Runtime per Iteration -- 4.4.1 Gradient Computation and Communication Time -- 4.4.2 Expected Runtime per Iteration -- 4.4.3 Error Versus Runtime Convergence -- 4.5 Straggler-Resilient Variants -- 4.5.1 K-Synchronous SGD -- 4.5.2 K-Batch-Synchronous SGD -- 5 Asynchronous SGD and Staleness-Reduced Variants -- 5.1 The Asynchronous SGD Algorithm -- 5.1.1 Comparison with Synchronous SGD -- 5.2 Runtime Analysis -- 5.2.1 Runtime Speed-Up Compared to Synchronous SGD -- 5.3 Convergence Analysis -- 5.3.1 Implications of the Asynchronous SGD Convergence Bound -- 5.4 Staleness-Reduced Variants of Asynchronous SGD -- 5.4.1 K-Asynchronous SGD -- 5.4.2 K-Batch-Asynchronous SGD -- 5.5 Adaptive Methods to Improve the Error-Runtime Trade-Off -- 5.5.1 Adaptive Synchronization -- 5.5.2 Adaptive Learning Rate Schedule to Compensate Staleness -- 5.6 HogWild and Lock-Free Parallelism -- 6 Local-Update and Overlap SGD -- 6.1 Local-Update SGD Algorithm -- 6.1.1 Convergence Analysis -- 6.1.2 Runtime Analysis -- 6.1.3 Adaptive Communication -- 6.2 Elastic and Overlap SGD -- 6.2.1 Elastic Averaging SGD -- 6.2.2 Overlap Local SGD -- 7 Quantized and Sparsified Distributed SGD -- [DELETE] -- 7.1 Quantized SGD -- 7.1.1 Uniform Stochastic Quantization -- 7.1.2 Convergence Analysis -- 7.1.3 Runtime Analysis -- 7.1.4 Adaptive Quantization -- 7.2 Sparsified SGD -- 7.2.1 Rand-k Sparsification -- 7.2.2 Top-k Sparsification -- 7.2.3 Rand-k Sparsified Distributed SGD -- 7.2.4 Error Feedback in Sparsified SGD -- 8 Decentralized SGD and Its Variants -- [DELETE] -- 8.1 Network Topology and Graph Notation -- 8.1.1 Adjacency Matrix -- 8.1.2 Laplacian Matrix -- 8.1.3 Mixing Matrix -- 8.2 Decentralized SGD -- 8.2.1 The Algorithm -- 8.2.2 Variants of Decentralized SGD. , 8.3 Error Convergence Analysis -- 8.3.1 Assumptions -- 8.3.2 Convergence Analysis of Decentralized SGD -- 8.3.3 Convergence Analysis of Decentralized Local-Update SGD -- 8.4 Runtime Analysis -- 9 Beyond Distributed Training in the Cloud -- [DELETE].
    Weitere Ausg.: Print version: Joshi, Gauri Optimization Algorithms for Distributed Machine Learning Cham : Springer International Publishing AG,c2023 ISBN 9783031190667
    Sprache: Englisch
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    Online-Ressource
    Online-Ressource
    Cham, Switzerland :Springer,
    UID:
    almafu_9961000629602883
    Umfang: 1 online resource (137 pages)
    ISBN: 9783031190674
    Serie: Synthesis Lectures on Learning, Networks, and Algorithms
    Anmerkung: Intro -- Preface -- Contents -- Acronyms and Symbols -- 1 Distributed Optimization in Machine Learning -- [DELETE] -- 1.1 SGD in Supervised Machine Learning -- 1.1.1 Training Data and Hypothesis -- 1.1.2 Empirical Risk Minimization -- 1.1.3 Gradient Descent -- 1.1.4 Stochastic Gradient Descent -- 1.1.5 Mini-batch SGD -- 1.1.6 Linear Regression -- 1.1.7 Logistic Regression -- 1.1.8 Neural Networks -- 1.2 Distributed Stochastic Gradient Descent -- 1.2.1 The Parameter Server Framework -- 1.2.2 The System-Aware Design Philosophy -- 1.3 Scalable Distributed SGD Algorithms -- 1.3.1 Straggler-Resilient and Asynchronous SGD -- 1.3.2 Communication-Efficient Distributed SGD -- 1.3.3 Decentralized SGD -- 2 Calculus, Probability and Order Statistics Review -- [DELETE] -- 2.1 Calculus and Linear Algebra -- 2.1.1 Norms and Inner Products -- 2.1.2 Lipschitz Continuity and Smoothness -- 2.1.3 Strong Convexity -- 2.2 Probability Review -- 2.2.1 Random Variable -- 2.2.2 Expectation and Variance -- 2.2.3 Some Canonical Random Variables -- 2.2.4 Bayes Rule and Conditional Probability -- 2.3 Order Statistics -- 2.3.1 Order Statistics of the Exponential Distribution -- 2.3.2 Order Statistics of the Uniform Distribution -- 2.3.3 Asymptotic Distribution of Quantiles -- 3 Convergence of SGD and Variance-Reduced Variants -- [DELETE] -- 3.1 Gradient Descent (GD) Convergence -- 3.1.1 Effect of Learning Rate and Other Parameters -- 3.1.2 Iteration Complexity -- 3.2 Convergence Analysis of Mini-batch SGD -- 3.2.1 Effect of Learning Rate and Mini-batch Size -- 3.2.2 Iteration Complexity -- 3.2.3 Non-convex Objectives -- 3.3 Variance-Reduced SGD Variants -- 3.3.1 Dynamic Mini-batch Size Schedule -- 3.3.2 Stochastic Average Gradient (SAG) -- 3.3.3 Stochastic Variance Reduced Gradient (SVRG) -- 4 Synchronous SGD and Straggler-Resilient Variants -- 4.1 Parameter Server Framework. , 4.2 Distributed Synchronous SGD Algorithm -- 4.3 Convergence Analysis -- 4.3.1 Iteration Complexity -- 4.4 Runtime per Iteration -- 4.4.1 Gradient Computation and Communication Time -- 4.4.2 Expected Runtime per Iteration -- 4.4.3 Error Versus Runtime Convergence -- 4.5 Straggler-Resilient Variants -- 4.5.1 K-Synchronous SGD -- 4.5.2 K-Batch-Synchronous SGD -- 5 Asynchronous SGD and Staleness-Reduced Variants -- 5.1 The Asynchronous SGD Algorithm -- 5.1.1 Comparison with Synchronous SGD -- 5.2 Runtime Analysis -- 5.2.1 Runtime Speed-Up Compared to Synchronous SGD -- 5.3 Convergence Analysis -- 5.3.1 Implications of the Asynchronous SGD Convergence Bound -- 5.4 Staleness-Reduced Variants of Asynchronous SGD -- 5.4.1 K-Asynchronous SGD -- 5.4.2 K-Batch-Asynchronous SGD -- 5.5 Adaptive Methods to Improve the Error-Runtime Trade-Off -- 5.5.1 Adaptive Synchronization -- 5.5.2 Adaptive Learning Rate Schedule to Compensate Staleness -- 5.6 HogWild and Lock-Free Parallelism -- 6 Local-Update and Overlap SGD -- 6.1 Local-Update SGD Algorithm -- 6.1.1 Convergence Analysis -- 6.1.2 Runtime Analysis -- 6.1.3 Adaptive Communication -- 6.2 Elastic and Overlap SGD -- 6.2.1 Elastic Averaging SGD -- 6.2.2 Overlap Local SGD -- 7 Quantized and Sparsified Distributed SGD -- [DELETE] -- 7.1 Quantized SGD -- 7.1.1 Uniform Stochastic Quantization -- 7.1.2 Convergence Analysis -- 7.1.3 Runtime Analysis -- 7.1.4 Adaptive Quantization -- 7.2 Sparsified SGD -- 7.2.1 Rand-k Sparsification -- 7.2.2 Top-k Sparsification -- 7.2.3 Rand-k Sparsified Distributed SGD -- 7.2.4 Error Feedback in Sparsified SGD -- 8 Decentralized SGD and Its Variants -- [DELETE] -- 8.1 Network Topology and Graph Notation -- 8.1.1 Adjacency Matrix -- 8.1.2 Laplacian Matrix -- 8.1.3 Mixing Matrix -- 8.2 Decentralized SGD -- 8.2.1 The Algorithm -- 8.2.2 Variants of Decentralized SGD. , 8.3 Error Convergence Analysis -- 8.3.1 Assumptions -- 8.3.2 Convergence Analysis of Decentralized SGD -- 8.3.3 Convergence Analysis of Decentralized Local-Update SGD -- 8.4 Runtime Analysis -- 9 Beyond Distributed Training in the Cloud -- [DELETE].
    Weitere Ausg.: Print version: Joshi, Gauri Optimization Algorithms for Distributed Machine Learning Cham : Springer International Publishing AG,c2023 ISBN 9783031190667
    Sprache: Englisch
    Bibliothek Standort Signatur Band/Heft/Jahr Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Meinten Sie 9783031020667?
Meinten Sie 9783031106767?
Meinten Sie 9783031109867?
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie auf den KOBV Seiten zum Datenschutz