\documentclass[12pt]{report}

\setlength{\textwidth}{6.5in}
\setlength{\textheight}{8.5in}
\setlength{\topmargin}{0.0in}
\setlength{\oddsidemargin}{0.0in}


\usepackage{graphics}
\usepackage{curves}
\usepackage{tikz}

\begin{document}

Version: \today

\Huge

\begin{center}

\textbf{Notes on the Multivariate Normal and Related Topics}

\end{center}


\Large



\bigskip

Let me refresh your memory about the distinctions between \emph{population} and \emph{sample}; \emph{parameters} and \emph{statistics}; \emph{population distributions} and \emph{sampling distributions}.  One might say that anyone worth knowing, knows these distinctions.  We start with the simple univariate framework and then move on to the multivariate context.

\bigskip

A) Begin by positing a \emph{population} of interest that is operationalized  by some random variable, say $X$.  In this Theory World framework, $X$ is characterized by \emph{parameters}, such as the expectation of $X$, $\mu = \mathrm{E}(X)$, or its variance, $\sigma^{2} = \mathrm{V}(X)$.  The random variable $X$ has a \emph{(population) distribution}, which for us will typically be assumed normal.

\bigskip


B) A \emph{sample} is generated by taking observations on $X$, say, $X_{1}, \ldots, X_{n}$, considered independent and identically distributed as $X$, i.e., they are exact copies of $X$.   In this Data World context, statistics are functions of the sample, and therefore, characterize the sample: the sample mean, $\hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} X_{i}$; the sample variance, $\hat{\sigma}^{2} = \frac{1}{n} \sum_{i=1}^{n} (X_{i} - \hat{\mu})^{2}$, with some possible variation in dividing by $n-1$ to generate an unbiased estimator of $\sigma^{2}$.  The statistics, $\hat{\mu}$ and $\hat{\sigma}^{2}$, are \emph{point estimators} of $\mu$ and $\sigma^{2}$.  They are random variables by themselves, so they have distributions that are called \emph{sampling distributions}.

\bigskip

The general problem of statistical inference is to ask what the sample statistics, $\hat{\mu}$ and $\hat{\sigma}^{2}$, tell us about their population counterparts, such as $\mu$ and $\sigma^{2}$.  Can we obtain some notion of accuracy from the sampling distribution, e.g., confidence intervals?

\bigskip

The multivariate problem is generally the same but with more notation:

\bigskip

A)  The population (Theory World) is characterized by a collection of $p$ random variables, $\mathbf{X}' = [X_{1}, X_{2}, \ldots, X_{p}]$, with parameters:    $\mu_{i} = \mathrm{E}(X_{i})$; $\sigma_{i}^{2} = \mathrm{V}(X_{i})$; $\sigma_{ij} = \mathrm{Cov}(X_{i}, X_{j}) = \mathrm{E}[(X_{i} - \mathrm{E}(X_{i}))(X_{j} - \mathrm{E}(X_{j}))]$; or the correlation between $X_{i}$ and $X_{j}$, $\rho_{ij} = \sigma_{ij}/\sigma_{i}\sigma_{j}$.

\bigskip

B)  The sample (Data World) is defined by $n$ independent observations on the random vector $\mathbf{X}$, with the observations placed into an $n \times p$ data matrix (e.g., subject by variable) that we also (with a slight abuse of notation) denote by a bold-face capital letter,  $\mathbf{X}_{n \times p}$: \[ \mathbf{X}_{n \times p} = \left( \begin{array}{cccc}
x_{11} & x_{12} & \cdots & x_{1p} \\
x_{21} & x_{22} & \cdots & x_{2p} \\
 \vdots & & & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np} \\

\end{array} \right) = \left( \begin{array}{c}

\mathbf{x}_{1}' \\

\mathbf{x}_{2}' \\

\vdots \\

\mathbf{x}_{n}' \\

\end{array} \right) \ . \] The statistics corresponding to the parameters of the population are: $\hat{\mu_{i}} = \frac{1}{n} \sum_{k=1}^{n} x_{ki}$; $\hat{\sigma_{i}}^{2} = \frac{1}{n} \sum_{k=1}^{n} (x_{ki} - \hat{\mu_{i}})^{2}$, with again some possible variation to a division by $n-1$ for an unbiased estimate; $\hat{\sigma}_{ij} = \frac{1}{n} \sum_{k=1}^{n} (x_{ki} - \hat{\mu_{i}})(x_{kj} - \hat{\mu_{j}})$; and $\hat{\rho}_{ij} = \hat{\sigma}_{ij} / \hat{\sigma}_{i}\hat{\sigma}_{j}$.

\bigskip

To obtain a good sense of what the estimators tell us about the population parameters, we will have to make some assumption about the population, e.g., $[X_{1},\ldots,X_{p}]$ has a  multivariate normal distribution.  As we will see, this assumption leads to some very nice results.

\section{Developing the Multivariate Normal Distribution}

Suppose $X_{1}, \ldots, X_{p}$ are $p$ continuous random variables with density functions, $f_{1}(x_{1}), \ldots, f_{p}(x_{p})$,  and distribution functions, $F_{1}(x_{1}), \ldots, F_{p}(x_{p})$, where \[ P(X_{i} \le x_{i}) = F_{i}(x_{i}) = \int_{-\infty}^{x_{i}} f_{i}(x_{i}) dx \ . \]  We define a $p$-dimensional random variable (or random vector) as the vector, $\mathbf{X}' = [X_{1}, \ldots, X_{p}]$; $\mathbf{X}$ has the joint cumulative distribution function \[ F(x_{1}, \ldots, x_{p}) = \int_{-\infty}^{x_{p}} \cdots \int_{-\infty}^{x_{1}} f(x_{1},\ldots,x_{p}) dx_{1} \cdots dx_{p} \ . \]  If the random variables, $X_{1}, \ldots, X_{p}$ are independent, then the joint density  and cumulative distribution functions factor: $f(x_{1}, \ldots, x_{p}) = f_{1}(x_{1}) \cdots f_{p}(x_{p})$ and $F(x_{1}, \ldots, x_{p}) = F_{1}(x_{1}) \cdots F_{p}(x_{p})$.  The independence property will not be assumed in the following;  in fact, the whole idea is to investigate what type of dependency exists in the random vector $\mathbf{X}$.

\bigskip

When $\mathbf{X}' = [X_{1}, X_{2}, \ldots, X_{p}] \sim \mathrm{MVN}(\mbox{\boldmath $\mu$}, \mbox{\boldmath $\Sigma$})$, the joint density function has the form \[ \phi(x_{1}, \ldots, x_{p}) = \frac{1}{(2\pi)^{p/2} |\mbox{\boldmath $\Sigma$}|^{1/2}} \exp \{ -\frac{1}{2} (\mathbf{x} - \mbox{\boldmath $\mu$})' \mbox{\boldmath $\Sigma$}^{-1} (\mathbf{x} - \mbox{\boldmath $\mu$}) \} \ . \]  In the univariate case, the density provides the usual bell-shaped curve: $\phi(x) = \frac{1}{\sqrt{2\pi} \sigma} \exp \{-\frac{1}{2} [(x - \mu)/\sigma)^{2}] \}$.

\bigskip

Given the MVN assumption on the generation of the data matrix, $\mathbf{X}_{n \times p}$, we know the sampling distribution of the vector of means: \[ \mbox{\boldmath $\hat{\mu}$} = \left( \begin{array}{c}
\hat{\mu}_{1} \\
\hat{\mu}_{2} \\
\vdots \\
\hat{\mu}_{p} \\

\end{array} \right) \sim \mathrm{MVN}(\mbox{\boldmath $\mu$}, (1/n)\mbox{\boldmath $\Sigma$}) \] In the univariate case, we have that $\hat{\mu} \sim \mathrm{N}(\mu, (1/n)\sigma^{2})$.  As might be expected, a Central Limit Theorem (CLT) states that these results also hold asymptotically when the MVN assumption is relaxed.  Also, if we estimate the variance-covariance matrix,  $\mbox{\boldmath $\Sigma$}$, with divisions by $n-1$ rather than $n$, and denote the result as \mbox{\boldmath $\hat{\Sigma}$}, then E(\mbox{\boldmath $\hat{\Sigma}$})   = \mbox{\boldmath $\Sigma$}.

\bigskip

Linear combinations of random variables form the backbone of all of multivariate statistics.  For example, suppose $\mathbf{X}' = [X_{1}, X_{2}, \ldots, X_{p}] \sim \mathrm{MVN}(\mbox{\boldmath $\mu$}, \mbox{\boldmath $\Sigma$})$, and consider the following $q$ linear combinations: \[ \begin{array}{c}
Z_{1} = c_{11}X_{1} + \cdots + c_{1p}X_{p} \\
\vdots \\
Z_{q} = c_{q1}X_{1} + \cdots + c_{qp}X_{p} \\
\end{array} \]  Then the vector $\mathbf{Z}' = [Z_{1}, Z_{2}, \ldots, Z_{q}] \sim \mathrm{MVN}(\mathbf{C}\mbox{\boldmath $\mu$}, \mathbf{C}\mbox{\boldmath $\Sigma$}\mathbf{C}')$, where \[ \mathbf{C}_{q \times p} =  \left( \begin{array}{ccc}
c_{11} & \cdots & c_{1p} \\
 \vdots & & \vdots \\
 c_{q1} & \cdots & c_{qp} \\
 \end{array} \right) \ . \]  These same results hold in the sample if we observe the $q$ linear combinations, $[Z_{1}, Z_{2}, \ldots, Z_{q}]$: i.e., the sample mean vector is $\mathbf{C}\mbox{\boldmath $\hat{\mu}$}$ and the sample variance-covariance matrix is $\mathbf{C}\mbox{\boldmath $\hat{\Sigma}$}\mathbf{C}'$.

\section{Multiple Regression and Partial Correlation (in the Population)}

Suppose we partition our vector of $p$ random variables as follows: \[ \mathbf{X}' = [X_{1}, X_{2}, \ldots, X_{p}] = \] \[ [[X_{1}, X_{2}, \ldots, X_{k}],[X_{k+1}, X_{k+2} \ldots, X_{p}]] \equiv [\mathbf{X}_{1}', \mathbf{X}_ {2}'] \] Partitioning the mean vector and variance-covariance matrix in the same way, \[\mbox{\boldmath $\mu$} = \left( \begin{array}{c}

\mbox{\boldmath $\mu_{1}$} \\
\mbox{\boldmath $\mu_{2}$} \\

\end{array} \right) \ ;   \  \mbox{\boldmath $\Sigma$} = \left( \begin{array}{cc}

\mbox{\boldmath $\Sigma_{11}$} & \mbox{\boldmath $\Sigma_{12}$} \\
\mbox{\boldmath $\Sigma_{12}'$}  & \mbox{\boldmath $\Sigma_{22}$}\\

\end{array} \right) ,    \] we have \[ \mathbf{X}_{1}' \sim \mathrm{MVN}(\mbox{\boldmath $\mu_{1}$}, \mbox{\boldmath $\Sigma_{11}$}) \ ; \  \mathbf{X}_{2}' \sim \mathrm{MVN}(\mbox{\boldmath $\mu_{2}$}, \mbox{\boldmath $\Sigma_{22}$}) \ . \] $\mathbf{X}_{1}'$ and $\mathbf{X}_{2}'$ are statistically independent vectors if and only if $\mbox{\boldmath $\Sigma_{12}$} = \mbox{\boldmath $0_{k \times p}$}  $, the $k \times p$ zero matrix.

\bigskip

What I call the Master Theorem refers to the conditional density of $\mathbf{X}_{1}$ given $\mathbf{X}_{2} = \mathbf{x}_{2}$; it is multivariate normal with mean vector of order $k \times 1$: \[ \mbox{\boldmath $\mu_{1}$} + \mbox{\boldmath $\Sigma_{12}$} \mbox{\boldmath $\Sigma_{22}^{-1}$}(\mathbf{x}_{2} - \mbox{\boldmath $\mu_{2}$}) \ , \] and variance-covariance matrix of order $k \times k$: \[ \mbox{\boldmath $\Sigma_{11}$} -  \mbox{\boldmath $\Sigma_{12}$} \mbox{\boldmath $\Sigma_{22}^{-1}$}\mbox{\boldmath $\Sigma_{12}'$} \ . \]  If we denote the $(i,j)^{th}$  partial covariance element in this latter matrix (`holding' all variables in the second set `constant') as $\sigma_{ij \cdot (k+1) \cdots p}$, then the partial correlation of variables $i$ and $j$,  `holding' all variables in the second set `constant', is \[ \rho_{ij \cdot (k+1) \cdots p} = \sigma_{ij \cdot (k+1) \cdots p} / \sqrt{\sigma_{ii \cdot (k+1) \cdots p} \sigma_{jj \cdot (k+1) \cdots p}} \ . \]  Notice that in the formulas just given, the inverse of $\mbox{\boldmath $\Sigma_{22}$}$ must exist; otherwise, we have what is called ``multicollinearity,'' and solutions don't exist (or are very unstable numerically if the inverse `almost' doesn't exist).  So, as one moral: don't use both total  and subtest scores in the same set of ``independent'' variables.

\bigskip

If the set $\mathbf{X}_{1}$ contains just one random variable, $X_{1}$, then the mean vector of the conditional distribution can be given as \[ \mathrm{E}(X_{1}) + \mbox{\boldmath $\sigma_{12}'$} \mbox{\boldmath $\Sigma_{22}^{-1}$}(\mathbf{x}_{2} - \mbox{\boldmath $\mu_{2}$}) \ , \] where \mbox{\boldmath $\sigma_{12}'$} is $ 1 \times (p-1)$ and of the form $[\mathrm{Cov}(X_{1},X_{2}), \ldots, \mathrm{Cov}(X_{1},X_{p})]$.  This is nothing but our old regression equation written out in matrix notation.  If we let $\mbox{\boldmath $\beta'$} =  \mbox{\boldmath $\sigma_{12}'$} \mbox{\boldmath $\Sigma_{22}^{-1}$}$, then the conditional mean vector (predicted $X_{1}$) is equal to $\mathrm{E}(X_{1}) + \mbox{\boldmath $\beta'$}(\mathbf{x}_{2} - \mbox{\boldmath $\mu_{2}$}) = \mathrm{E}(X_{1}) + \beta_{1}(x_{2} - \mu_{2}) + \cdots + \beta_{p-1}(x_{p} - \mu_{p})$.  The covariance matrix, $ \mbox{\boldmath $\Sigma_{11}$} -  \mbox{\boldmath $\Sigma_{12}$} \mbox{\boldmath $\Sigma_{22}^{-1}$}\mbox{\boldmath $\Sigma_{12}'$}$, takes on the form $ \mathrm{Var}(X_{1}) -  \mbox{\boldmath $\sigma_{12}$} \mbox{\boldmath $\Sigma_{22}^{-1}$}\mbox{\boldmath $\sigma_{12}'$}$, i.e., the variation in $X_{1}$ that is \emph{not explained}.  If we take explained variation, $\mbox{\boldmath $\sigma_{12}$} \mbox{\boldmath $\Sigma_{22}^{-1}$}\mbox{\boldmath $\sigma_{12}'$}$, and consider the proportion to the total variance, $\mathrm{Var}(X_{1})$, we define the squared multiple correlation coefficient: \[ \rho_{1 \cdot 2 \cdots p}^{2} = \frac{\mbox{\boldmath $\sigma_{12}$} \mbox{\boldmath $\Sigma_{22}^{-1}$}\mbox{\boldmath $\sigma_{12}'$}}{\mathrm{Var}(X_{1})} \ . \]  In fact, the linear combination, $\mbox{\boldmath $\beta'$}\mathbf{X}_{2}$, has the highest correlation of any linear combination with $X_{1}$; and this correlation is the positive square root of the squared multiple correlation coefficient.

\section{Moving to the Sample}

Up to this point, the concepts of regression and kindred ideas, have been discussed only in terms of population parameters.  We now have  the task of obtaining estimates of these various quantities based on a sample; also, associated inference procedures need to be developed.  Generally, we will rely on maximum-likelihood estimation and the related likelihood-ratio tests.

\bigskip

To define how maximum-likelihood estimation proceeds, we will first give a series of general steps, and then operationalize for a univariate normal distribution example.

\bigskip

A) Let $X_{1}, \ldots, X_{n}$ be univariate observations on $X$ (independent and continuous, and depending on some parameters, $\theta_{1}, \ldots, \theta_{k}$.  The density function of $X_{i}$ is denoted as $f(x_{i}; \theta_{1}, \ldots, \theta_{k})$.

\bigskip

B)  The likelihood of observing $x_{1}, \ldots, x_{n}$ for values of $X_{1}, \ldots, X_{n}$ is the joint density of the $n$ observations, and because the observations are independent, \[ \mathrm{L}(\theta_{1}, \ldots, \theta_{k}) \equiv \prod_{i=1}^{n} f(x_{i}; \theta_{1}, \ldots, \theta_{k})\ , \] i.e., we assume $x_{1}, \ldots, x_{n}$ are already observed, and that $\mathrm{L}$ is a function of the parameters only.

\bigskip

Now, choose parameter values such that $\mathrm{L}(\theta_{1}, \ldots, \theta_{k})$ is at a maximum --- i.e., the probability of observing that particular sample is maximized by the choice of  $\theta_{1}, \ldots, \theta_{k}$.  Generally, it is easier and equivalent to maximize the log-likelihood using $\ell(\theta_{1}, \ldots, \theta_{k}) \equiv \log( \mathrm{L}(\theta_{1}, \ldots, \theta_{k}))$.

\bigskip


C)  Generally, the maximum values are found through differentiation, and by setting the partial derivatives equal to zero.  Explicitly, we find $\theta_{1}, \ldots, \theta_{k}$ such that \[ \frac{\partial}{\partial \theta_{j}} \ell(\theta_{1}, \ldots, \theta_{k}) = 0 \ , \] for $j = 1, \ldots, k$.  (We should probably check second derivatives to see if we have a maximum, but this is almost always true, anyways.)

\bigskip

Example:

\bigskip

Suppose $X_{i} \sim \mathrm{N}(\mu, \sigma^{2})$, with density \[ f(x_{i}; \mu, \sigma^{2}) = \frac{1}{\sigma \sqrt{2\pi}}\exp\{-(x_{i} - \mu)^{2}/2\sigma^{2}\} \ . \]  The likelihood has the form \[ \mathrm{L}(\mu, \sigma^{2}) = \prod_{i=1}^{n}  \frac{1}{\sigma \sqrt{2\pi}}\exp\{-(x_{i} - \mu)^{2}/2\sigma^{2}\} = \] \[ (1/\sqrt{2\pi})^{n}(1/\sigma^{2})^{n/2} \exp\{-(1/2\sigma^{2}) \sum_{i=1}^{n} (x_{i} - \mu)^{2}\} \ . \] The log-likelihood reduces: \[ \ell(\mu, \sigma^{2}) = \log \mathrm{L}(\mu, \sigma^{2}) = \] \[\log(1/\sqrt{2\pi})^{n} + \log(1/\sigma^{2})^{n/2} + (-(1/2\sigma^{2}) \sum_{i=1}^{n} (x_{i} - \mu)^{2}) = \] \[ \mathrm{constant} - (n/2) \log \sigma^{2} - (1/2\sigma^{2}) \sum_{i=1}^{n} (x_{i} - \mu)^{2} \ . \]

\bigskip

The partial derivatives have the form:
\[ \frac{\partial \ell(\mu, \sigma^{2})}{\partial \mu} = -(1/2\sigma^{2}) \sum_{i=1}^{n} 2(x_{i} - \mu)(-1) \ , \] and  \[ \frac{\partial \ell(\mu, \sigma^{2})}{\partial \sigma^{2}} = -(n/2)(1/\sigma^{2}) - (1/2(\sigma^{2})^{2})(-1) \sum_{i=1}^{n} (x_{i} - \mu)^{2} \ . \]  Setting these two expressions to zero, gives \[ \hat{\mu} = \sum_{i=1}^{n} x_{i} / n \ ; \  \hat{\sigma}^{2} = (1/n) \sum_{i=1}^{n} (x_{i} - \hat{\mu})^{2} \ . \]

\bigskip

Maximum likelihood (ML) estimates are generally consistent, asymptotically normal (for large $n$), and efficient; a function of the sufficient statistics; and have an invariance to operations performed on them -- e.g., the ML estimate for $\sigma$ is just $\sqrt{\hat{\sigma}^{2}}$.  As  the ML estimator for $\sigma^{2}$ shows, ML estimators are not necessarily unbiased.

\bigskip

Another Example:

\bigskip

Suppose $X_{1}, \ldots, X_{n}$ are observations on the Poisson discrete random variable $X$ (having outcomes $0, 1, \ldots $): \[ P(X_{i} = x_{i}) = \frac{\exp(-\lambda) \lambda^{x_{i}}}{x_{i}!} \ . \]  The likelihood is \[ \mathrm{L}(\lambda) = \prod_{i=1}^{n} P(X_{i} = x_{i}) = \frac{\exp(-n\lambda) \lambda^{\sum_{i} x_{i}}}{\prod_{i} x_{i}!} \ , \] and the log-likelihood \[ \ell(\lambda) = \log \mathrm{L}(\lambda) = -n\lambda + \log (\lambda ^{\sum_{i} x_{i}}) - \log \prod_{i} x_{i}! \ . \] The partial derivative \[ \frac{\partial \ell(\lambda)}{\partial \lambda} = -n + (1/\lambda) \sum_{i=1}^{n} x_{i} \ , \] and when set to zero, gives \[ \hat{\lambda} = \sum_{i=1}^{n} x_{i} / n \ . \]

\subsection{Likelihood Ratio Tests}

Besides using the likelihood concept to find good point estimates, the likelihood can also be used to find good tests of hypotheses.  We will develop this idea in terms of a simple example:  Suppose $X_{1} = x_{1}, \ldots, X_{n} = x_{n}$ denote values obtained on $n$ independent observations on a $\mathrm{N}(\mu, \sigma^{2})$ random variable, and where $\sigma^{2}$ is assumed known.  The standard test of $H_{0}: \mu = \mu_{0}$ versus $H_{1}: \mu \ne \mu_{0}$, would compare $(\bar{x}_{\cdot} - \mu_{0})^{2} / (\sigma^{2}/n)$ to a $\chi^{2}$ random variable with one-degree of freedom.  Or, if we chose the significance level to be $.05$, we could reject $H_{0}$ if $|\bar{x}_{\cdot} - \mu_{0}|/(\sigma/\sqrt{n}) \ge 1.96$.  We now develop this same test using likelihood ideas:

\bigskip

The likelihood of observing $x_{1}, \ldots, x_{n}$, $\mathrm{L}(\mu)$, is only a function of $\mu$ (because $\sigma^{2}$ is assumed to be known):
\[ \mathrm{L}(\mu) =  (1/\sigma \sqrt{2\pi})^{n} \exp\{-(1/2\sigma^{2}) \sum_{i=1}^{n} (x_{i} - \mu)^{2}\} \ . \]  Under $H_{0}$,  $\mathrm{L}(\mu)$ is at a maximum for $\mu = \mu_{0}$; under $H_{1}$, $\mathrm{L}(\mu)$ is at a maximum for $\mu = \hat{\mu}$, the maximum likelihood estimator.  If $H_{0}$ is true, $\mathrm{L}(\hat{\mu})$ and $\mathrm{L}(\mu_{0})$ should be close to each other; If $H_{0}$ is false, $\mathrm{L}(\hat{\mu})$ should be much larger than $\mathrm{L}(\mu_{0})$.  Thus, the decision rule would be to reject $H_{0}$ if $\mathrm{L}(\mu_{0}) / \mathrm{L}(\hat{\mu}) \le \lambda$, where $\lambda$ is some number less than 1.00 and  chosen to obtain a particular $\alpha$ level.  The ratio, ($\mathrm{L}(\mu_{0}) / \mathrm{L}(\hat{\mu})) = \exp\{-(1/2)(\hat{\mu} - \mu_{0})^{2}\}/(\sigma^{2}/n)$.  Thus, we could rephrase the decision rule: reject $H_{0}$ if $(\hat{\mu} - \mu_{0})^{2}\}/(\sigma^{2}/n) \ge -2\log(\lambda)$, or if  $|\bar{x}_{\cdot} - \mu_{0}|/(\sigma/\sqrt{n}) \ge \sqrt{-2\log \lambda}$.  Thus, for an $.05$ level of significance, choose $\lambda$ so $1.96 = \sqrt{-2\log \lambda}$.  Generally, we can phrase likelihood ratio tests as: \[ -2\log(\mathrm{likelihood \ ratio})   \sim \chi^{2}_{\nu - \nu_{0}} \ , \] where $\nu$ is the dimension of the parameter space generally, and $\nu_{0}$ is the dimension under $H_{0}$.


\subsection{Estimation}

To obtain estimates of the various quantities we need, merely replace the variances and covariances by their maximum likelihood estimates. This process generates sample partial correlations or covariances; sample multiple squared correlations; sample regression parameters; and so on.






\end{document}