Least-squares variance component estimation - Theory and GPS applications下载

weixin_39821746 2019-05-19 07:30:16
Least-squares variance component estimation - Theory and GPS applications
相关下载链接://download.csdn.net/download/pengwenzhi/2269175?utm_source=bbsseo
...全文
55 回复 打赏 收藏 转发到动态 举报
写回复
用AI写文章
回复
切换为时间正序
请发表友善的回复…
发表回复
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Part One — Matrices 1 Basic properties of vectors and matrices 3 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 Matrices: addition and multiplication . . . . . . . . . . . . . . . 4 4 The transpose of a matrix . . . . . . . . . . . . . . . . . . . . . 6 5 Square matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 Linear forms and quadratic forms . . . . . . . . . . . . . . . . . 7 7 The rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . 8 8 The inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 The determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 The trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 Partitioned matrices . . . . . . . . . . . . . . . . . . . . . . . . 11 12 Complex matrices . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 14 14 Schur’s decomposition theorem . . . . . . . . . . . . . . . . . . 17 15 The Jordan decomposition . . . . . . . . . . . . . . . . . . . . . 18 16 The singular-value decomposition . . . . . . . . . . . . . . . . . 19 17 Further results concerning eigenvalues . . . . . . . . . . . . . . 20 18 Positive (semi)de�nite matrices . . . . . . . . . . . . . . . . . . 23 19 Three further results for positive de�nite matrices . . . . . . . 25 20 A useful result . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2 Kronecker products, the vec operator and the Moore-Penrose inverse 31 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2 The Kronecker product . . . . . . . . . . . . . . . . . . . . . . 31 3 Eigenvalues of a Kronecker product . . . . . . . . . . . . . . . . 33 4 The vec operator . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5 The Moore-Penrose (MP) inverse . . . . . . . . . . . . . . . . . 36 6 Existence and uniqueness of the MP inverse . . . . . . . . . . . 37 v vi Contents 7 Some properties of the MP inverse . . . . . . . . . . . . . . . . 38 8 Further properties . . . . . . . . . . . . . . . . . . . . . . . . . 39 9 The solution of linear equation systems . . . . . . . . . . . . . 41 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3 Miscellaneous matrix results 47 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2 The adjoint matrix . . . . . . . . . . . . . . . . . . . . . . . . . 47 3 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 49 4 Bordered determinants . . . . . . . . . . . . . . . . . . . . . . . 51 5 The matrix equation AX = 0 . . . . . . . . . . . . . . . . . . . 51 6 The Hadamard product . . . . . . . . . . . . . . . . . . . . . . 53 7 The commutation matrix K mn . . . . . . . . . . . . . . . . . . 54 8 The duplication matrix D n . . . . . . . . . . . . . . . . . . . . 56 9 Relationship between D n+1 and D n , I . . . . . . . . . . . . . . 58 10 Relationship between D n+1 and D n , II . . . . . . . . . . . . . . 60 11 Conditions for a quadratic form to be positive (negative) sub- ject to linear constraints . . . . . . . . . . . . . . . . . . . . . . 61 12 Necessary and su�cient conditions for r(A : B) = r(A) + r(B) 64 13 The bordered Gramian matrix . . . . . . . . . . . . . . . . . . 66 14 The equations X 1 A + X 2 B ′ = G 1 ,X 1 B = G 2 . . . . . . . . . . 68 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Part Two — Di�erentials: the theory 4 Mathematical preliminaries 75 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 2 Interior points and accumulation points . . . . . . . . . . . . . 75 3 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . 76 4 The Bolzano-Weierstrass theorem . . . . . . . . . . . . . . . . . 79 5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6 The limit of a function . . . . . . . . . . . . . . . . . . . . . . . 81 7 Continuous functions and compactness . . . . . . . . . . . . . . 82 8 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 9 Convex and concave functions . . . . . . . . . . . . . . . . . . . 85 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5 Di�erentials and di�erentiability 89 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3 Di�erentiability and linear approximation . . . . . . . . . . . . 91 4 The di�erential of a vector function . . . . . . . . . . . . . . . . 93 5 Uniqueness of the di�erential . . . . . . . . . . . . . . . . . . . 95 6 Continuity of di�erentiable functions . . . . . . . . . . . . . . . 96 7 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 97 Contents vii 8 The �rst identi�cation theorem . . . . . . . . . . . . . . . . . . 98 9 Existence of the di�erential, I . . . . . . . . . . . . . . . . . . . 99 10 Existence of the di�erential, II . . . . . . . . . . . . . . . . . . 101 11 Continuous di�erentiability . . . . . . . . . . . . . . . . . . . . 103 12 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 13 Cauchy invariance . . . . . . . . . . . . . . . . . . . . . . . . . 105 14 The mean-value theorem for real-valued functions . . . . . . . . 106 15 Matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . 107 16 Some remarks on notation . . . . . . . . . . . . . . . . . . . . . 109 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6 The second di�erential 113 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 2 Second-order partial derivatives . . . . . . . . . . . . . . . . . . 113 3 The Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . 114 4 Twice di�erentiability and second-order approximation, I . . . 115 5 De�nition of twice di�erentiability . . . . . . . . . . . . . . . . 116 6 The second di�erential . . . . . . . . . . . . . . . . . . . . . . . 118 7 (Column) symmetry of the Hessian matrix . . . . . . . . . . . . 120 8 The second identi�cation theorem . . . . . . . . . . . . . . . . 122 9 Twice di�erentiability and second-order approximation, II . . . 123 10 Chain rule for Hessian matrices . . . . . . . . . . . . . . . . . . 125 11 The analogue for second di�erentials . . . . . . . . . . . . . . . 126 12 Taylor’s theorem for real-valued functions . . . . . . . . . . . . 128 13 Higher-order di�erentials . . . . . . . . . . . . . . . . . . . . . . 129 14 Matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7 Static optimization 133 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 2 Unconstrained optimization . . . . . . . . . . . . . . . . . . . . 134 3 The existence of absolute extrema . . . . . . . . . . . . . . . . 135 4 Necessary conditions for a local minimum . . . . . . . . . . . . 137 5 Su�cient conditions for a local minimum: �rst-derivative test . 138 6 Su�cient conditions for a local minimum: second-derivative test 140 7 Characterization of di�erentiable convex functions . . . . . . . 142 8 Characterization of twice di�erentiable convex functions . . . . 145 9 Su�cient conditions for an absolute minimum . . . . . . . . . . 147 10 Monotonic transformations . . . . . . . . . . . . . . . . . . . . 147 11 Optimization subject to constraints . . . . . . . . . . . . . . . . 148 12 Necessary conditions for a local minimum under constraints . . 149 13 Su�cient conditions for a local minimum under constraints . . 154 14 Su�cient conditions for an absolute minimum under constraints158 15 A note on constraints in matrix form . . . . . . . . . . . . . . . 159 16 Economic interpretation of Lagrange multipliers . . . . . . . . . 160 Appendix: the implicit function theorem . . . . . . . . . . . . . . . . 162 viii Contents Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Part Three — Di�erentials: the practice 8 Some important di�erentials 167 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 2 Fundamental rules of di�erential calculus . . . . . . . . . . . . 167 3 The di�erential of a determinant . . . . . . . . . . . . . . . . . 169 4 The di�erential of an inverse . . . . . . . . . . . . . . . . . . . 171 5 Di�erential of the Moore-Penrose inverse . . . . . . . . . . . . . 172 6 The di�erential of the adjoint matrix . . . . . . . . . . . . . . . 175 7 On di�erentiating eigenvalues and eigenvectors . . . . . . . . . 177 8 The di�erential of eigenvalues and eigenvectors: symmetric case 179 9 The di�erential of eigenvalues and eigenvectors: complex case . 182 10 Two alternative expressions for dλ . . . . . . . . . . . . . . . . 185 11 Second di�erential of the eigenvalue function . . . . . . . . . . 188 12 Multiple eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 189 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 9 First-order di�erentials and Jacobian matrices 193 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 2 Classi�cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 3 Bad notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 4 Good notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 5 Identi�cation of Jacobian matrices . . . . . . . . . . . . . . . . 198 6 The �rst identi�cation table . . . . . . . . . . . . . . . . . . . . 198 7 Partitioning of the derivative . . . . . . . . . . . . . . . . . . . 199 8 Scalar functions of a vector . . . . . . . . . . . . . . . . . . . . 200 9 Scalar functions of a matrix, I: trace . . . . . . . . . . . . . . . 200 10 Scalar functions of a matrix, II: determinant . . . . . . . . . . . 202 11 Scalar functions of a matrix, III: eigenvalue . . . . . . . . . . . 204 12 Two examples of vector functions . . . . . . . . . . . . . . . . . 204 13 Matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . 205 14 Kronecker products . . . . . . . . . . . . . . . . . . . . . . . . . 208 15 Some other problems . . . . . . . . . . . . . . . . . . . . . . . . 210 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 10 Second-order di�erentials and Hessian matrices 213 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 2 The Hessian matrix of a matrix function . . . . . . . . . . . . . 213 3 Identi�cation of Hessian matrices . . . . . . . . . . . . . . . . . 214 4 The second identi�cation table . . . . . . . . . . . . . . . . . . 215 5 An explicit formula for the Hessian matrix . . . . . . . . . . . . 217 6 Scalar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 7 Vector functions . . . . . . . . . . . . . . . . . . . . . . . . . . 219 8 Matrix functions, I . . . . . . . . . . . . . . . . . . . . . . . . . 220 Contents ix 9 Matrix functions, II . . . . . . . . . . . . . . . . . . . . . . . . 221 Part Four — Inequalities 11 Inequalities 225 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 2 The Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . 225 3 Matrix analogues of the Cauchy-Schwarz inequality . . . . . . . 227 4 The theorem of the arithmetic and geometric means . . . . . . 228 5 The Rayleigh quotient . . . . . . . . . . . . . . . . . . . . . . . 230 6 Concavity of λ 1 , convexity of λ n . . . . . . . . . . . . . . . . . 231 7 Variational description of eigenvalues . . . . . . . . . . . . . . . 232 8 Fischer’s min-max theorem . . . . . . . . . . . . . . . . . . . . 233 9 Monotonicity of the eigenvalues . . . . . . . . . . . . . . . . . . 235 10 The Poincar´ e separation theorem . . . . . . . . . . . . . . . . . 236 11 Two corollaries of Poincar´ e’s theorem . . . . . . . . . . . . . . 237 12 Further consequences of the Poincar´ e theorem . . . . . . . . . . 238 13 Multiplicative version . . . . . . . . . . . . . . . . . . . . . . . 239 14 The maximum of a bilinear form . . . . . . . . . . . . . . . . . 241 15 Hadamard’s inequality . . . . . . . . . . . . . . . . . . . . . . . 242 16 An interlude: Karamata’s inequality . . . . . . . . . . . . . . . 243 17 Karamata’s inequality applied to eigenvalues . . . . . . . . . . 245 18 An inequality concerning positive semide�nite matrices . . . . . 245 19 A representation theorem for ( � a p i ) 1/p . . . . . . . . . . . . . 246 20 A representation theorem for (trA p ) 1/p . . . . . . . . . . . . . . 248 21 Hölder’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . 249 22 Concavity of log|A| . . . . . . . . . . . . . . . . . . . . . . . . . 250 23 Minkowski’s inequality . . . . . . . . . . . . . . . . . . . . . . . 252 24 Quasilinear representation of |A| 1/n . . . . . . . . . . . . . . . . 254 25 Minkowski’s determinant theorem . . . . . . . . . . . . . . . . . 256 26 Weighted means of order p . . . . . . . . . . . . . . . . . . . . . 256 27 Schlömilch’s inequality . . . . . . . . . . . . . . . . . . . . . . . 259 28 Curvature properties of M p (x,a) . . . . . . . . . . . . . . . . . 260 29 Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 30 Generalized least squares . . . . . . . . . . . . . . . . . . . . . 263 31 Restricted least squares . . . . . . . . . . . . . . . . . . . . . . 263 32 Restricted least squares: matrix version . . . . . . . . . . . . . 265 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Part Five — The linear model 12 Statistical preliminaries 275 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 2 The cumulative distribution function . . . . . . . . . . . . . . . 275 3 The joint density function . . . . . . . . . . . . . . . . . . . . . 276 4 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 x Contents 5 Variance and covariance . . . . . . . . . . . . . . . . . . . . . . 277 6 Independence of two random variables . . . . . . . . . . . . . . 279 7 Independence of n random variables . . . . . . . . . . . . . . . 281 8 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 9 The one-dimensional normal distribution . . . . . . . . . . . . . 281 10 The multivariate normal distribution . . . . . . . . . . . . . . . 282 11 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 13 The linear regression model 287 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 2 A�ne minimum-trace unbiased estimation . . . . . . . . . . . . 288 3 The Gauss-Markov theorem . . . . . . . . . . . . . . . . . . . . 289 4 The method of least squares . . . . . . . . . . . . . . . . . . . . 292 5 Aitken’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 293 6 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . 295 7 Estimable functions . . . . . . . . . . . . . . . . . . . . . . . . 297 8 Linear constraints: the case M(R ′ ) ⊂ M(X ′ ) . . . . . . . . . . 299 9 Linear constraints: the general case . . . . . . . . . . . . . . . . 302 10 Linear constraints: the case M(R ′ ) ∩ M(X ′ ) = {0} . . . . . . . 305 11 A singular variance matrix: the case M(X) ⊂ M(V ) . . . . . . 306 12 A singular variance matrix: the case r(X ′ V + X) = r(X) . . . . 308 13 A singular variance matrix: the general case, I . . . . . . . . . . 309 14 Explicit and implicit linear constraints . . . . . . . . . . . . . . 310 15 The general linear model, I . . . . . . . . . . . . . . . . . . . . 313 16 A singular variance matrix: the general case, II . . . . . . . . . 314 17 The general linear model, II . . . . . . . . . . . . . . . . . . . . 317 18 Generalized least squares . . . . . . . . . . . . . . . . . . . . . 318 19 Restricted least squares . . . . . . . . . . . . . . . . . . . . . . 319 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 14 Further topics in the linear model 323 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 2 Best quadratic unbiased estimation of σ 2 . . . . . . . . . . . . 323 3 The best quadratic and positive unbiased estimator of σ 2 . . . 324 4 The best quadratic unbiased estimator of σ 2 . . . . . . . . . . . 326 5 Best quadratic invariant estimation of σ 2 . . . . . . . . . . . . 329 6 The best quadratic and positive invariant estimator of σ 2 . . . 330 7 The best quadratic invariant estimator of σ 2 . . . . . . . . . . . 331 8 Best quadratic unbiased estimation: multivariate normal case . 332 9 Bounds for the bias of the least squares estimator of σ 2 , I . . . 335 10 Bounds for the bias of the least squares estimator of σ 2 , II . . . 336 11 The prediction of disturbances . . . . . . . . . . . . . . . . . . 338 12 Best linear unbiased predictors with scalar variance matrix . . 339 13 Best linear unbiased predictors with �xed variance matrix, I . . 341 Contents xi 14 Best linear unbiased predictors with �xed variance matrix, II . 344 15 Local sensitivity of the posterior mean . . . . . . . . . . . . . . 345 16 Local sensitivity of the posterior precision . . . . . . . . . . . . 347 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Part Six — Applications to maximum likelihood estimation 15 Maximum likelihood estimation 351 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 2 The method of maximum likelihood (ML) . . . . . . . . . . . . 351 3 ML estimation of the multivariate normal distribution . . . . . 352 4 Symmetry: implicit versus explicit treatment . . . . . . . . . . 354 5 The treatment of positive de�niteness . . . . . . . . . . . . . . 355 6 The information matrix . . . . . . . . . . . . . . . . . . . . . . 356 7 ML estimation of the multivariate normal distribution: distinct means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 8 The multivariate linear regression model . . . . . . . . . . . . . 358 9 The errors-in-variables model . . . . . . . . . . . . . . . . . . . 361 10 The non-linear regression model with normal errors . . . . . . . 364 11 Special case: functional independence of mean- and variance parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 12 Generalization of Theorem 6 . . . . . . . . . . . . . . . . . . . 366 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 16 Simultaneous equations 371 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 2 The simultaneous equations model . . . . . . . . . . . . . . . . 371 3 The identi�cation problem . . . . . . . . . . . . . . . . . . . . . 373 4 Identi�cation with linear constraints on B and Γ only . . . . . 375 5 Identi�cation with linear constraints on B,Γ and Σ . . . . . . . 375 6 Non-linear constraints . . . . . . . . . . . . . . . . . . . . . . . 377 7 Full-information maximum likelihood (FIML): the information matrix (general case) . . . . . . . . . . . . . . . . . . . . . . . . 378 8 Full-information maximum likelihood (FIML): the asymptotic variance matrix (special case) . . . . . . . . . . . . . . . . . . . 380 9 Limited-information maximum likelihood (LIML): the �rst-order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 10 Limited-information maximum likelihood (LIML): the informa- tion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 11 Limited-information maximum likelihood (LIML): the asymp- totic variance matrix . . . . . . . . . . . . . . . . . . . . . . . . 388 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 xii Contents 17 Topics in psychometrics 395 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 2 Population principal components . . . . . . . . . . . . . . . . . 396 3 Optimality of principal components . . . . . . . . . . . . . . . . 397 4 A related result . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 5 Sample principal components . . . . . . . . . . . . . . . . . . . 399 6 Optimality of sample principal components . . . . . . . . . . . 401 7 Sample analogue of Theorem 3 . . . . . . . . . . . . . . . . . . 401 8 One-mode component analysis . . . . . . . . . . . . . . . . . . 401 9 One-mode component analysis and sample principal compo- nents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 10 Two-mode component analysis . . . . . . . . . . . . . . . . . . 405 11 Multimode component analysis . . . . . . . . . . . . . . . . . . 406 12 Factor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 13 A zigzag routine . . . . . . . . . . . . . . . . . . . . . . . . . . 413 14 A Newton-Raphson routine . . . . . . . . . . . . . . . . . . . . 415 15 Kaiser’s varimax method . . . . . . . . . . . . . . . . . . . . . . 418 16 Canonical correlations and variates in the population . . . . . . 421 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Index of symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Subject index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
书的目录 Contents Website viii Acknowledgments ix Notation xiii 1 Introduction 1 1.1 Who Should Read This Book? . . . . . . . . . . . . . . . . . . . . 8 1.2 Historical Trends in Deep Learning . . . . . . . . . . . . . . . . . 12 I Applied Math and Machine Learning Basics 27 2 Linear Algebra 29 2.1 Scalars, Vectors, Matrices and Tensors . . . . . . . . . . . . . . . 29 2.2 Multiplying Matrices and Vectors . . . . . . . . . . . . . . . . . . 32 2.3 Identity and Inverse Matrices . . . . . . . . . . . . . . . . . . . . 34 2.4 Linear Dependence and Span . . . . . . . . . . . . . . . . . . . . 35 2.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Special Kinds of Matrices and Vectors . . . . . . . . . . . . . . . 38 2.7 Eigendecomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.8 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 42 2.9 The Moore-Penrose Pseudoinverse . . . . . . . . . . . . . . . . . . 43 2.10 The Trace Operator . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.11 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.12 Example: Principal Components Analysis . . . . . . . . . . . . . 45 3 Probability and Information Theory 51 3.1 Why Probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 i CONTENTS 3.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 Marginal Probability . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 57 3.6 The Chain Rule of Conditional Probabilities . . . . . . . . . . . . 57 3.7 Independence and Conditional Independence . . . . . . . . . . . . 58 3.8 Expectation, Variance and Covariance . . . . . . . . . . . . . . . 58 3.9 Common Probability Distributions . . . . . . . . . . . . . . . . . 60 3.10 Useful Properties of Common Functions . . . . . . . . . . . . . . 65 3.11 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.12 Technical Details of Continuous Variables . . . . . . . . . . . . . 69 3.13 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.14 Structured Probabilistic Models . . . . . . . . . . . . . . . . . . . 73 4 Numerical Computation 78 4.1 Overflow and Underflow . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Poor Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3 Gradient-Based Optimization . . . . . . . . . . . . . . . . . . . . 80 4.4 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . 91 4.5 Example: Linear Least Squares . . . . . . . . . . . . . . . . . . . 94 5 Machine Learning Basics 96 5.1 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2 Capacity, Overfitting and Underfitting . . . . . . . . . . . . . . . 108 5.3 Hyperparameters and Validation Sets . . . . . . . . . . . . . . . . 118 5.4 Estimators, Bias and Variance . . . . . . . . . . . . . . . . . . . . 120 5.5 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . 129 5.6 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.7 Supervised Learning Algorithms . . . . . . . . . . . . . . . . . . . 137 5.8 Unsupervised Learning Algorithms . . . . . . . . . . . . . . . . . 142 5.9 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . 149 5.10 Building a Machine Learning Algorithm . . . . . . . . . . . . . . 151 5.11 Challenges Motivating Deep Learning . . . . . . . . . . . . . . . . 152 II Deep Networks: Modern Practices 162 6 Deep Feedforward Networks 164 6.1 Example: Learning XOR . . . . . . . . . . . . . . . . . . . . . . . 167 6.2 Gradient-Based Learning . . . . . . . . . . . . . . . . . . . . . . . 172 ii CONTENTS 6.3 Hidden Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.4 Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . 193 6.5 Back-Propagation and Other Differentiation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 6.6 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 7 Regularization for Deep Learning 224 7.1 Parameter Norm Penalties . . . . . . . . . . . . . . . . . . . . . . 226 7.2 Norm Penalties as Constrained Optimization . . . . . . . . . . . . 233 7.3 Regularization and Under-Constrained Problems . . . . . . . . . 235 7.4 Dataset Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 236 7.5 Noise Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 7.6 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . 240 7.7 Multitask Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 241 7.8 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 7.9 Parameter Tying and Parameter Sharing . . . . . . . . . . . . . . 249 7.10 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . 251 7.11 Bagging and Other Ensemble Methods . . . . . . . . . . . . . . . 253 7.12 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 7.13 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 265 7.14 Tangent Distance, Tangent Prop and Manifold Tangent Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 8 Optimization for Training Deep Models 271 8.1 How Learning Differs from Pure Optimization . . . . . . . . . . . 272 8.2 Challenges in Neural Network Optimization . . . . . . . . . . . . 279 8.3 Basic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 8.4 Parameter Initialization Strategies . . . . . . . . . . . . . . . . . 296 8.5 Algorithms with Adaptive Learning Rates . . . . . . . . . . . . . 302 8.6 Approximate Second-Order Methods . . . . . . . . . . . . . . . . 307 8.7 Optimization Strategies and Meta-Algorithms . . . . . . . . . . . 313 9 Convolutional Networks 326 9.1 The Convolution Operation . . . . . . . . . . . . . . . . . . . . . 327 9.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 9.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 9.4 Convolution and Pooling as an Infinitely Strong Prior . . . . . . . 339 9.5 Variants of the Basic Convolution Function . . . . . . . . . . . . 342 9.6 Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 352 9.7 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 iii CONTENTS 9.8 Efficient Convolution Algorithms . . . . . . . . . . . . . . . . . . 356 9.9 Random or Unsupervised Features . . . . . . . . . . . . . . . . . 356 9.10 The Neuroscientific Basis for Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 9.11 Convolutional Networks and the History of Deep Learning . . . . 365 10 Sequence Modeling: Recurrent and Recursive Nets 367 10.1 Unfolding Computational Graphs . . . . . . . . . . . . . . . . . . 369 10.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . 372 10.3 Bidirectional RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . 388 10.4 Encoder-Decoder Sequence-to-Sequence Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 10.5 Deep Recurrent Networks . . . . . . . . . . . . . . . . . . . . . . 392 10.6 Recursive Neural Networks . . . . . . . . . . . . . . . . . . . . . . 394 10.7 The Challenge of Long-Term Dependencies . . . . . . . . . . . . . 396 10.8 Echo State Networks . . . . . . . . . . . . . . . . . . . . . . . . . 399 10.9 Leaky Units and Other Strategies for Multiple Time Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 10.10 The Long Short-Term Memory and Other Gated RNNs . . . . . . 404 10.11 Optimization for Long-Term Dependencies . . . . . . . . . . . . . 408 10.12 Explicit Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 11 Practical Methodology 416 11.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 417 11.2 Default Baseline Models . . . . . . . . . . . . . . . . . . . . . . . 420 11.3 Determining Whether to Gather More Data . . . . . . . . . . . . 421 11.4 Selecting Hyperparameters . . . . . . . . . . . . . . . . . . . . . . 422 11.5 Debugging Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 431 11.6 Example: Multi-Digit Number Recognition . . . . . . . . . . . . . 435 12 Applications 438 12.1 Large-Scale Deep Learning . . . . . . . . . . . . . . . . . . . . . . 438 12.2 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 12.3 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 453 12.4 Natural Language Processing . . . . . . . . . . . . . . . . . . . . 456 12.5 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 473 iv CONTENTS III Deep Learning Research 482 13 Linear Factor Models 485 13.1 Probabilistic PCA and Factor Analysis . . . . . . . . . . . . . . . 486 13.2 Independent Component Analysis (ICA) . . . . . . . . . . . . . . 487 13.3 Slow Feature Analysis . . . . . . . . . . . . . . . . . . . . . . . . 489 13.4 Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 13.5 Manifold Interpretation of PCA . . . . . . . . . . . . . . . . . . . 496 14 Autoencoders 499 14.1 Undercomplete Autoencoders . . . . . . . . . . . . . . . . . . . . 500 14.2 Regularized Autoencoders . . . . . . . . . . . . . . . . . . . . . . 501 14.3 Representational Power, Layer Size and Depth . . . . . . . . . . . 505 14.4 Stochastic Encoders and Decoders . . . . . . . . . . . . . . . . . . 506 14.5 Denoising Autoencoders . . . . . . . . . . . . . . . . . . . . . . . 507 14.6 Learning Manifolds with Autoencoders . . . . . . . . . . . . . . . 513 14.7 Contractive Autoencoders . . . . . . . . . . . . . . . . . . . . . . 518 14.8 Predictive Sparse Decomposition . . . . . . . . . . . . . . . . . . 521 14.9 Applications of Autoencoders . . . . . . . . . . . . . . . . . . . . 522 15 Representation Learning 524 15.1 Greedy Layer-Wise Unsupervised Pretraining . . . . . . . . . . . 526 15.2 Transfer Learning and Domain Adaptation . . . . . . . . . . . . . 534 15.3 Semi-Supervised Disentangling of Causal Factors . . . . . . . . . 539 15.4 Distributed Representation . . . . . . . . . . . . . . . . . . . . . . 544 15.5 Exponential Gains from Depth . . . . . . . . . . . . . . . . . . . 550 15.6 Providing Clues to Discover Underlying Causes . . . . . . . . . . 552 16 Structured Probabilistic Models for Deep Learning 555 16.1 The Challenge of Unstructured Modeling . . . . . . . . . . . . . . 556 16.2 Using Graphs to Describe Model Structure . . . . . . . . . . . . . 560 16.3 Sampling from Graphical Models . . . . . . . . . . . . . . . . . . 577 16.4 Advantages of Structured Modeling . . . . . . . . . . . . . . . . . 579 16.5 Learning about Dependencies . . . . . . . . . . . . . . . . . . . . 579 16.6 Inference and Approximate Inference . . . . . . . . . . . . . . . . 580 16.7 The Deep Learning Approach to Structured Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . . 581 17 Monte Carlo Methods 587 17.1 Sampling and Monte Carlo Methods . . . . . . . . . . . . . . . . 587 v CONTENTS 17.2 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 589 17.3 Markov Chain Monte Carlo Methods . . . . . . . . . . . . . . . . 592 17.4 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 17.5 The Challenge of Mixing between Separated Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 18 Confronting the Partition Function 603 18.1 The Log-Likelihood Gradient . . . . . . . . . . . . . . . . . . . . 604 18.2 Stochastic Maximum Likelihood and Contrastive Divergence . . . 605 18.3 Pseudolikelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 18.4 Score Matching and Ratio Matching . . . . . . . . . . . . . . . . 615 18.5 Denoising Score Matching . . . . . . . . . . . . . . . . . . . . . . 617 18.6 Noise-Contrastive Estimation . . . . . . . . . . . . . . . . . . . . 618 18.7 Estimating the Partition Function . . . . . . . . . . . . . . . . . . 621 19 Approximate Inference 629 19.1 Inference as Optimization . . . . . . . . . . . . . . . . . . . . . . 631 19.2 Expectation Maximization . . . . . . . . . . . . . . . . . . . . . . 632 19.3 MAP Inference and Sparse Coding . . . . . . . . . . . . . . . . . 633 19.4 Variational Inference and Learning . . . . . . . . . . . . . . . . . 636 19.5 Learned Approximate Inference . . . . . . . . . . . . . . . . . . . 648 20 Deep Generative Models 651 20.1 Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . . . . 651 20.2 Restricted Boltzmann Machines . . . . . . . . . . . . . . . . . . . 653 20.3 Deep Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . 657 20.4 Deep Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . 660 20.5 Boltzmann Machines for Real-Valued Data . . . . . . . . . . . . . 673 20.6 Convolutional Boltzmann Machines . . . . . . . . . . . . . . . . . 679 20.7 Boltzmann Machines for Structured or Sequential Outputs . . . . 681 20.8 Other Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . 683 20.9 Back-Propagation through Random Operations . . . . . . . . . . 684 20.10 Directed Generative Nets . . . . . . . . . . . . . . . . . . . . . . . 688 20.11 Drawing Samples from Autoencoders . . . . . . . . . . . . . . . . 707 20.12 Generative Stochastic Networks . . . . . . . . . . . . . . . . . . . 710 20.13 Other Generation Schemes . . . . . . . . . . . . . . . . . . . . . . 712 20.14 Evaluating Generative Models . . . . . . . . . . . . . . . . . . . . 713 20.15 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716 Bibliography 717 vi CONTENTS Index 774

13,655

社区成员

发帖
与我相关
我的任务
社区描述
CSDN 下载资源悬赏专区
其他 技术论坛(原bbs)
社区管理员
  • 下载资源悬赏专区社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧