Sponsoring Section/Society: ASA-COMP

Session Slot: 10:30-12:20 Monday

Estimated Audience Size: 50-75

AudioVisual Request: None

*Session Title: Computing for Large Mixed Models
*

Theme Session: No

Applied Session: Yes

Session Organizer: **Wolfinger, Russell**
SAS Institute Inc.

Address: SAS Institute Inc. R-52 SAS Campus Drive Cary, NC 27513

Phone: 919-677-8000 x6695

Fax: 919-677-4444

Email: sasrdw@unx.sas.com

Session Timing: 110 minutes total (Sorry about format):

First Speaker - 35 minutes Second Speaker - 35 minutes Third Speaker - 35 minutes Floor Discusion - 5 minutes

Session Chair: **Wolfinger, Russell**
SAS Institute Inc.

Address: SAS Institute Inc. R-52 SAS Campus Drive Cary, NC 27513

Phone: 919-677-8000 x6695

Fax: 919-677-4444

Email: sasrdw@unx.sas.com

*1. Making REML Feasible For Mixed-Model Inference From
Large Data Sets: Use of the Gibbs Sampler*

**Harville, David**,
IBM Thomas J. Watson Research Center

Address: Mathematical Sciences Department, IBM Thomas J. Watson Research Center P. O. Box 218, Yorktown Heights, NY 10598-0218

Phone:

Fax:

Email: harville@watson.ibm.com

Abstract: The computations encountered in the REML (restricted maximum likelihood) estimation of variance components involve the repeated (for each of a number of trial values of the variance ratios) evaluation of the trace and of various other linear combinations of the inverse, say V, of a positive definite matrix W, whose dimensions equal the number of random effects. The matrix W is the matrix obtained from the so-called mixed-model equations by absorbing the equations for the fixed effects into those for the random effects. Typically, W is sparse or exhibits other potentially useful structure. Such structure can be exploited by reexpressing the quantities to be evaluated as expected values of a random vector, say x, whose distribution is multivariate normal with variance-covariance matrix V (and with a null mean vector). The evaluation can then be carried out by an MCMC approach in which the Gibbs sampler is used to draw from the distribution of x. This approach can produce large savings in (computer) time and memory, thereby making REML estimation feasible for much larger numbers of random effects (and consequently for much larger data sets) than ever before.

*2. Fast Computing of REML Estimates in Large Linear
Mixed Models with Simple Random Effects *

**Bulavsky, Yurii**,
Institute of Computational Mathematics and Mathematical
Geophysics, Novosibirsk, Russia

Address: Institute of Computational Mathematics and Mathematical Geophysics Novosibirsk Lavrent'ev Ave., 6 630090, Russia

Phone: 007-3832-35-67-21

Fax: 007-3832-32-42-59

Email: bul@osmf.sscc.ru

Abstract: Considering some effects in a linear model to be random is often a useful tool for modeling uncertainty; however, additional computational cost can be incurred in estimating variance components. A direct implementation of the REML approach involves solving a linear system in the mixed model matrix and computing the inverse of this matrix in each iteration. If the number of levels of random effects is large then forming and directly inverting this matrix may take too much memory and time.

We consider techniques that avoid the formation and direct inversion of the mixed model matrix, thus making computations much faster. The suggested approach is based on linear algebra formulae for the inverse of an augmented matrix and for the inverse of a low rank modification. We also use some averaging approximations based on the missing-at-random assumption. This leads to approximate but very close to exact REML estimates, and for the test problems considered the differences between the exact and approximate estimates were negligible compared to confidence intervals. The new approach can be tens and hundreds times faster than traditional computations based on directly inverting the mixed model matrix.

*3. Direct and Sparse-Matrix Methods of Computing for
Mixed Linear Models *

**Smith, Stephen P.**,
Rodgers Land and Development Company

Address: Rodgers Land and Development Company 5390 Washington St. Napa, CA 94558

Phone: 510-680-6882

Fax:

Email: hucklebird@aol.com

Abstract: There are three popular approaches to directly compute estimates and predictions of effects in mixed linear models. The first approach, the QR algorithm, is suitable for rectangular design matrices. It is promoted mainly for least squares analysis, but the QR algorithm is easily adapted for general mixed models. The second and most common approach is to apply Gaussian elimination or the Cholesky decomposition to the normal equations or the mixed model equations. The third approach is less common, and involves special matrix operations that are suitable for symmetric and indefinite systems of linear equations, called the augmented equations. Because the augmented coefficient matrix is indefinite, it may be necessary to use the Bunch and Parlett factorization to secure numerical stability. The Cholesky decomposition exists for selected row and column permutations of the augmented coefficient matrix, but Cholesky's algorithm may show numerical instability due to the indefiniteness of the coefficient matrix. All of these direct approaches have been adapted to treat sparse matrices, and this enhances their applicability for large mixed models, where computing resources (space and time) are issues. Interestingly, likelihood evaluation is a trivial by-product of prediction and estimation. Hence, variance component estimation is possible via restricted maximum likelihood, by maximizing the objective function that is assembled from the intermediate calculations that result from one of the three approaches. First and second derivatives are useful to optimize the objective function, but these are available via automatic differentiation (forward and backward derivatives). The differentiation of the Cholesky algorithm is a useful result not only for the second approach, but also for the first because the QR algorithm provides an alternative method to compute the Cholesky decomposition. The differentiation of the Bunch and Parlett factorization is also useful, but this needs development.

List of speakers who are nonmembers: None