UID:
almafu_9960118247702883
Umfang:
1 online resource (viii, 424 pages) :
,
digital, PDF file(s).
ISBN:
1-108-62032-9
,
1-108-61736-0
,
1-108-75552-6
Inhalt:
This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.
Anmerkung:
Title from publisher's bibliographic system (viewed on 29 Jan 2020).
,
Cover -- Half-title -- Title page -- Copyright information -- Contents -- 1 Introduction -- 2 High-Dimensional Space -- 2.1 Introduction -- 2.2 The Law of Large Numbers -- 2.3 The Geometry of High Dimensions -- 2.4 Properties of the Unit Ball -- 2.4.1 Volume of the Unit Ball -- 2.4.2 Volume near the Equator -- 2.5 Generating Points Uniformly at Random from a Ball -- 2.6 Gaussians in High Dimension -- 2.7 Random Projection and Johnson-Lindenstrauss Lemma -- 2.8 Separating Gaussians -- 2.9 Fitting a Spherical Gaussian to Data -- 2.10 Bibliographic Notes -- 2.11 Exercises -- 3 Best-Fit Subspaces and Singular Value Decomposition (SVD) -- 3.1 Introduction -- 3.2 Preliminaries -- 3.3 Singular Vectors -- 3.4 Singular Value Decomposition (SVD) -- 3.5 Best Rank-k Approximations -- 3.6 Left Singular Vectors -- 3.7 Power Method for Singular Value Decomposition -- 3.7.1 A Faster Method -- 3.8 Singular Vectors and Eigenvectors -- 3.9 Applications of Singular Value Decomposition -- 3.9.1 Centering Data -- 3.9.2 Principal Component Analysis -- 3.9.3 Clustering a Mixture of Spherical Gaussians -- 3.9.4 Ranking Documents and Web Pages -- 3.9.5 An Illustrative Application of SVD -- 3.9.6 An Application of SVD to a Discrete Optimization Problem -- 3.10 Bibliographic Notes -- 3.11 Exercises -- 4 Random Walks and Markov Chains -- 4.1 Stationary Distribution -- 4.2 Markov Chain Monte Carlo -- 4.2.1 Metropolis-Hasting Algorithm -- 4.2.2 Gibbs Sampling -- 4.3 Areas and Volumes -- 4.4 Convergence of Random Walks on Undirected Graphs -- 4.4.1 Using Normalized Conductance to Prove Convergence -- 4.5 Electrical Networks and Random Walks -- 4.6 Random Walks on Undirected Graphs with Unit Edge Weights -- 4.6.1 Hitting Time -- 4.6.2 Commute Time -- 4.6.3 Cover Time -- 4.7 Random Walks in Euclidean Space -- 4.7.1 Random Walks on Lattices -- 4.7.2 Two Dimensions.
,
4.7.3 Three Dimensions -- 4.8 The Web as a Markov Chain -- 4.8.1 Pagerank -- 4.8.2 Relation to Hitting Time -- 4.8.3 Spam -- 4.8.4 Personalized Pagerank -- 4.8.5 Algorithm for Computing Personalized Pagerank -- 4.9 Bibliographic Notes -- 4.10 Exercises -- 5 Machine Learning -- 5.1 Introduction -- 5.1.1 The Core Problem -- 5.1.2 How to Learn -- 5.2 The Perceptron Algorithm -- 5.3 Kernel Functions and Nonlinearly Separable Data -- 5.4 Generalizing to New Data -- 5.4.1 Overfitting and Uniform Convergence -- 5.4.2 Occam's Razor -- 5.4.3 Regularization: Penalizing Complexity -- 5.5 VC-Dimension -- 5.5.1 Definitions and Key Theorems -- 5.5.2 VC-Dimension of Some Set Systems -- 5.5.3 Shatter Function for Set Systems of Bounded VC-Dimension -- 5.5.4 VC-Dimension of Combinations of Concepts -- 5.5.5 The Key Theorem -- 5.6 VC-Dimension and Machine Learning -- 5.7 Other Measures of Complexity -- 5.8 Deep Learning -- 5.8.1 Generative Adversarial Networks (GANs) -- 5.9 Gradient Descent -- 5.9.1 Stochastic Gradient Descent -- 5.9.2 Regularizer -- 5.10 Online Learning -- 5.10.1 An Example: Learning Disjunctions -- 5.10.2 The Halving Algorithm -- 5.10.3 The Perceptron Algorithm -- 5.10.4 Inseparable Data and Hinge Loss -- 5.10.5 Online to Batch Conversion -- 5.10.6 Combining (Sleeping) Expert Advice -- 5.11 Boosting -- 5.12 Further Current Directions -- 5.12.1 Semi-Supervised Learning -- 5.12.2 Active Learning -- 5.12.3 Multitask Learning -- 5.13 Bibliographic Notes -- 5.14 Exercises -- 6 Algorithms for Massive Data Problems: Streaming, Sketching, and Sampling -- 6.1 Introduction -- 6.2 Frequency Moments of Data Streams -- 6.2.1 Number of Distinct Elements in a Data Stream -- 6.2.2 Number of Occurrences of a Given Element -- 6.2.3 Frequent Elements -- 6.2.4 The Second Moment -- 6.3 Matrix Algorithms Using Sampling -- 6.3.1 Matrix Multiplication Using Sampling.
,
6.3.2 Implementing Length Squared Sampling in Two Passes -- 6.3.3 Sketch of a Large Matrix -- 6.4 Sketches of Documents -- 6.5 Bibliographic Notes -- 6.6 Exercises -- 7 Clustering -- 7.1 Introduction -- 7.1.1 Preliminaries -- 7.1.2 Two General Assumptions on the Form of Clusters -- 7.1.3 Spectral Clustering -- 7.2 k-Means Clustering -- 7.2.1 A Maximum-Likelihood Motivation -- 7.2.2 Structural Properties of the k-Means Objective -- 7.2.3 Lloyd's Algorithm -- 7.2.4 Ward's Algorithm -- 7.2.5 k-Means Clustering on the Line -- 7.3 k-Center Clustering -- 7.4 Finding Low-Error Clusterings -- 7.5 Spectral Clustering -- 7.5.1 Why Project? -- 7.5.2 The Algorithm -- 7.5.3 Means Separated by Ω(1) Standard Deviations -- 7.5.4 Laplacians -- 7.5.5 Local Spectral Clustering -- 7.6 Approximation Stability -- 7.6.1 The Conceptual Idea -- 7.6.2 Making This Formal -- 7.6.3 Algorithm and Analysis -- 7.7 High-Density Clusters -- 7.7.1 Single Linkage -- 7.7.2 Robust Linkage -- 7.8 Kernel Methods -- 7.9 Recursive Clustering Based on Sparse Cuts -- 7.10 Dense Submatrices and Communities -- 7.11 Community Finding and Graph Partitioning -- 7.12 Spectral Clustering Applied to Social Networks -- 7.13 Bibliographic Notes -- 7.14 Exercises -- 8 Random Graphs -- 8.1 The G(n,p) Model -- 8.1.1 Degree Distribution -- 8.1.2 Existence of Triangles in G(n, d/n) -- 8.2 Phase Transitions -- 8.3 Giant Component -- 8.3.1 Existence of a Giant Component -- 8.3.2 No Other Large Components -- 8.3.3 The Case of p < -- 1/n -- 8.4 Cycles and Full Connectivity -- 8.4.1 Emergence of Cycles -- 8.4.2 Full Connectivity -- 8.4.3 Threshold for O(ln n) Diameter -- 8.5 Phase Transitions for Increasing Properties -- 8.6 Branching Processes -- 8.7 CNF-SAT -- 8.7.1 SAT-Solvers in Practice -- 8.7.2 Phase Transitions for CNF-SAT -- 8.8 Nonuniform Models of Random Graphs.
,
8.8.1 Giant Component in Graphs with Given Degree Distribution -- 8.9 Growth Models -- 8.9.1 Growth Model without Preferential Attachment -- 8.9.2 Growth Model with Preferential Attachment -- 8.10 Small-World Graphs -- 8.11 Bibliographic Notes -- 8.12 Exercises -- 9 Topic Models, Nonnegative Matrix Factorization, Hidden Markov Models, and Graphical Models -- 9.1 Topic Models -- 9.2 An Idealized Model -- 9.3 Nonnegative Matrix Factorization -- 9.4 NMF with Anchor Terms -- 9.5 Hard and Soft Clustering -- 9.6 The Latent Dirichlet Allocation Model for Topic Modeling -- 9.7 The Dominant Admixture Model -- 9.8 Formal Assumptions -- 9.9 Finding the Term-Topic Matrix -- 9.10 Hidden Markov Models -- 9.11 Graphical Models and Belief Propagation -- 9.12 Bayesian or Belief Networks -- 9.13 Markov Random Fields -- 9.14 Factor Graphs -- 9.15 Tree Algorithms -- 9.16 Message Passing in General Graphs -- 9.16.1 Graphs with a Single Cycle -- 9.16.2 Belief Update in Networks with a Single Loop -- 9.16.3 Maximum Weight Matching -- 9.17 Warning Propagation -- 9.18 Correlation between Variables -- 9.19 Bibliographic Notes -- 9.20 Exercises -- 10 Other Topics -- 10.1 Ranking and Social Choice -- 10.1.1 Randomization -- 10.1.2 Examples -- 10.2 Compressed Sensing and Sparse Vectors -- 10.2.1 Unique Reconstruction of a Sparse Vector -- 10.2.2 Efficiently Finding the Unique Sparse Solution -- 10.3 Applications -- 10.3.1 Biological -- 10.3.2 Low-Rank Matrices -- 10.4 An Uncertainty Principle -- 10.4.1 Sparse Vector in Some Coordinate Basis -- 10.4.2 A Representation Cannot Be Sparse in Both Time and Frequency Domains -- 10.5 Gradient -- 10.6 Linear Programming -- 10.6.1 The Ellipsoid Algorithm -- 10.7 Integer Optimization -- 10.8 Semi-Definite Programming -- 10.9 Bibliographic Notes -- 10.10 Exercises -- 11 Wavelets -- 11.1 Dilation -- 11.2 The Haar Wavelet.
,
11.3 Wavelet Systems -- 11.4 Solving the Dilation Equation -- 11.5 Conditions on the Dilation Equation -- 11.6 Derivation of the Wavelets from the Scaling Function -- 11.7 Sufficient Conditions for the Wavelets to Be Orthogonal -- 11.8 Expressing a Function in Terms of Wavelets -- 11.9 Designing a Wavelet System -- 11.10 Applications -- 11.11 Bibliographic Notes -- 11.12 Exercises -- 12 Background Material -- 12.1 Definitions and Notation -- 12.1.1 Integers -- 12.1.2 Substructures -- 12.1.3 Asymptotic Notation -- 12.2 Useful Relations -- 12.3 Useful Inequalities -- 12.4 Probability -- 12.4.1 Sample Space, Events, and Independence -- 12.4.2 Linearity of Expectation -- 12.4.3 Union Bound -- 12.4.4 Indicator Variables -- 12.4.5 Variance -- 12.4.6 Variance of the Sum of Independent Random Variables -- 12.4.7 Median -- 12.4.8 The Central Limit Theorem -- 12.4.9 Probability Distributions -- 12.4.10 Bayes Rule and Estimators -- 12.5 Bounds on Tail Probability -- 12.5.1 Chernoff Bounds -- 12.5.2 More General Tail Bounds -- 12.6 Applications of the Tail Bound -- 12.7 Eigenvalues and Eigenvectors -- 12.7.1 Symmetric Matrices -- 12.7.2 Relationship between SVD and Eigen Decomposition -- 12.7.3 Extremal Properties of Eigenvalues -- 12.7.4 Eigenvalues of the Sum of Two Symmetric Matrices -- 12.7.5 Norms -- 12.7.6 Important Norms and Their Properties -- 12.7.7 Additional Linear Algebra -- 12.7.8 Distance between Subspaces -- 12.7.9 Positive Semi-Definite Matrix -- 12.8 Generating Functions -- 12.8.1 Generating Functions for Sequences Defined by Recurrence Relationships -- 12.8.2 The Exponential Generating Function and the Moment Generating Function -- 12.9 Miscellaneous -- 12.9.1 Lagrange Multipliers -- 12.9.2 Finite Fields -- 12.9.3 Application of Mean Value Theorem -- 12.10 Exercises -- References -- Index.
Weitere Ausg.:
Online version: Blum, Avrim, 1966- Foundations of data science New York, NY : Cambridge University Press, 2020. ISBN 9781108755528
Weitere Ausg.:
ISBN 9781108485067
Weitere Ausg.:
ISBN 1108485065
Sprache:
Englisch
URL:
Volltext
(lizenzpflichtig)
URL:
https://doi.org/10.1017/9781108755528
Bookmarklink