\documentclass[12pt]{report}

\setlength{\textwidth}{6.5in}
\setlength{\textheight}{8.5in}
\setlength{\topmargin}{0.0in}
\setlength{\oddsidemargin}{0.0in}


\usepackage{graphics}
\usepackage{curves}

\begin{document}

Version: \today

\Huge

\begin{center}

\textbf{Notes for Applied Multivariate Analysis with MATLAB}

\end{center}


\Large

\bigskip

These notes were written for use in the Quantitative Psychology courses at the University of Illinois, Champaign.  The expectation is that for Psychology 406/7 (Statistical Methods I and II), the material up through Section 0.1.12 be available to a student.  For Multivariate Analysis (Psychology 594) and Covariance Structure and Factor Models (Psychology 588), the remainder of the notes are relevant, with particular emphasis on the Singular Value Decomposition (SVD) and the Eigenvector/Eigenvalue Decomposition (Spectral Decomposition).

\newpage

\tableofcontents


\listoffigures

\newpage


\section{Necessary Matrix Algebra Tools}

The strategies of multivariate analysis tend to be confusing unless
specified compactly in matrix terms.  Therefore, we will spend some
significant amount of time on these topics because, in fact, most of
multivariate analysis falls out directly once we have these tools
under control. Remember the old Saturday Night Live skit with Hans
and Franz, ``listen to me now, and believe me later.''  I have a
goal in mind of where I would like you all to be --- at the point of
understanding and being able to work with what is called the
Singular Value Decomposition (SVD) of a matrix, and to understand
the matrix topics that lead up to the SVD. Very much like teaching
how to use some word-processing program, where we need to learn all
the various commands and what they do, an introduction to the matrix
tools can seem a little disjointed.  But just like word-processing
comes together more meaningfully when  required to do your own
manuscripts from beginning to end, once we proceed into the
techniques of multivariate analysis per se, the wisdom of this
preliminary matrix excursion will become apparent.


\subsection{Preliminaries}


A \emph{matrix} is merely an array of numbers; for example,


\[
  \left( \begin{array}{rrrr}
    4 & -1 & 3 & 1 \\
    4 & 6 & 0 & 2 \\
    7 & 2 & 1 & 4 \\
  \end{array} \right)
\]


\noindent is a matrix.  In general, we denote a matrix by an
uppercase (capital) boldface letter such as $\mathbf{A}$ (or using a
proofreader representation on the blackboard, a capital letter with
a wavy line underneath to indicate boldface):

\[ \mathbf{A} =
\left( \begin{array}{ccccc}
a_{11}         & a_{12}         & a_{13}         & \cdots    & a_{1V}        \\
a_{21}           & a_{22}       & a_{23}        & \cdots    & a_{2V}   \\
\vdots      & \vdots    & \vdots         & \ddots    & \vdots        \\
a_{U1}           & a_{U2}    & a_{U3}    & \cdots        & a_{UV}     \\
\end{array} \right) \]


\noindent This matrix has $U$ rows and $V$ columns and is said to
have \emph{order} $U \times V$.  An arbitrary element $a_{uv}$
refers to the element in the $u^{th}$ row and $v^{th}$ column, with
the row index always preceding the column index (and therefore, we
might use the notation of $\mathbf{A} = \{a_{uv}\}_{U \times V}$ to
indicate the matrix $\mathbf{A}$ as well as its order).

\smallskip

A $1 \times 1$ matrix such as $(4)_{1 \times 1}$ is just an ordinary
number, called a \emph{scalar}.  So without loss of any generality,
numbers are just matrices.  A \emph{vector} is a matrix with a
single row or column; we denote a column vector by a lowercase
boldface letter, e.g., $\mathbf{x}$, $\mathbf{y}$, $\mathbf{z}$, and
so on.  The vector

\[ \mathbf{x} =
\left(
  \begin{array}{c}
    x_{1} \\
    \vdots \\
    x_{U} \\
  \end{array}
\right)_{U \times 1} \]

\noindent is of order $U \times 1$; the column indices are typically
omitted since there is only one.  A row vector is written as

\[ \mathbf{x}' = (x_{1}, \ldots, x_{U})_{1 \times U} \]

\noindent with the prime indicating the \emph{transpose} of
$\mathbf{x}$, i.e., the interchange of row(s) and column(s).  This
transpose operation can be applied to any matrix; for example,

\[ \mathbf{A} =
\left(
  \begin{array}{rr}
    1 & -1 \\
    3 & 7 \\
    4 & 1 \\
  \end{array}
\right)_{3 \times 2} \]

\[ \mathbf{A}' =
\left(
  \begin{array}{rrr}
    1 & 3 & 4 \\
    -1 & 7 & 1 \\
  \end{array}
\right)_{2 \times 3} \]

If a matrix is \emph{square}, defined by having the same number of
rows as columns, say $U$, and if the matrix and its transpose are
equal, the matrix is said to be \emph{symmetric}.  Thus, in
$\mathbf{A} = \{a_{uv}\}_{U \times U}$, $a_{uv} = a_{vu}$ for all
$u$ and $v$.  As an example,

\[ \mathbf{A} = \mathbf{A}' =
\left(
  \begin{array}{rrr}
    1 & 4 & 3 \\
    4 & 7 & -1 \\
    3 & -1 & 3 \\
  \end{array}
\right) \]

\noindent For a square matrix $\mathbf{A}_{U \times U}$, the
elements $a_{uu}$, $1 \le u \le U$, lie along the \emph{main} or
\emph{principal} diagonal.  The sum of main diagonal entries of a
square matrix is called the \emph{trace}; thus,

\[
\mathrm{trace}(\mathbf{A}_{U \times U}) \equiv
\mathrm{tr}(\mathbf{A}) = a_{11} + \cdots + a_{UU} \]

A number of special matrices appear periodically in the notes to
follow.  A $U \times V$ matrix of all zeros is called a \emph{null}
matrix, and might be denoted by

\[ \mathbf{\emptyset} = \left(
  \begin{array}{ccc}
    0 & \cdots & 0 \\
    \vdots & \ddots & \vdots \\
    0 & \cdots  & 0 \\
  \end{array}
\right) \]

\noindent Similarly, we might at times need an $U \times V$ matrix
of all ones, say $\mathbf{E}$:

\[ \mathbf{E} = \left(
  \begin{array}{ccc}
    1 & \cdots & 1 \\
    \vdots & \ddots & \vdots \\
    1 & \cdots  & 1 \\
  \end{array}
\right) \]


  \noindent A \emph{diagonal} matrix is square
with zeros in all the off main-diagonal positions:

\[ \mathbf{D}_{U \times U} =
\left(
  \begin{array}{ccc}
    a_{1} & \cdots & 0 \\
    \vdots & \ddots & \vdots \\
    0 & \cdots & a_{U} \\
  \end{array}
\right)_{U \times U} \]

\noindent Here, we again indicate the main diagonal entries with
just one index as $a_{1}, a_{2}, \ldots, a_{U}$.  If all of the main
diagonal entries in a diagonal matrix are $1$s, we have the
\emph{identity} matrix denoted by $\mathbf{I}$:

\[ \mathbf{I} =
\left(
  \begin{array}{cccc}
    1 & 0 & \cdots & 0 \\
    0 & 1 & \cdots & 0 \\
    \vdots & \vdots & \ddots & \vdots \\
    0 & 0 & \cdots & 1 \\
  \end{array}
\right) \]

To introduce some useful operations on matrices, suppose we have two
matrices $\mathbf{A}$ and $\mathbf{B}$ of the same $U \times V$
order:

\[ \mathbf{A} =
\left(
  \begin{array}{ccc}
    a_{11} & \cdots & a_{1V} \\
    \vdots & \ddots & \vdots \\
    a_{U1} & \cdots & a_{UV} \\
  \end{array}
\right)_{U \times V} \]

\[ \mathbf{B} =
\left(
  \begin{array}{ccc}
    b_{11} & \cdots & b_{1V} \\
    \vdots & \ddots & \vdots \\
    b_{U1} & \cdots & b_{UV} \\
  \end{array}
\right)_{U \times V} \]

\noindent As a definition for equality of two matrices of the same
order (and for which it only makes sense to talk about equality), we
have:

$\mathbf{A}$ = $\mathbf{B}$ if and only if $a_{uv}$ = $b_{uv}$ for
all $u$ and $v$.

\noindent Remember, the ``if and only if'' statement (sometimes
abbreviated as ``iff'') implies two conditions:

if $\mathbf{A}$ = $\mathbf{B}$, then $a_{uv}$ = $b_{uv}$ for all $u$
and $v$;

if $a_{uv}$ = $b_{uv}$ for all $u$ and $v$, then $\mathbf{A}$ =
$\mathbf{B}$.

\noindent Any definition by its very nature implies an ``if and only
if'' statement.

\smallskip

To add two matrices together, they first have to be of the same
order (referred to as conformable for addition); we then do the
addition component by component:

\[ \mathbf{A} + \mathbf{B} =
\left(
  \begin{array}{ccc}
    a_{11} + b_{11} & \cdots & a_{1V} + b_{1V}\\
    \vdots & \ddots & \vdots \\
    a_{U1} + b_{U1} & \cdots & a_{UV} + b_{UV} \\
  \end{array}
\right)_{U \times V} \]

\indent To preform scalar multiplication of a matrix $\mathbf{A}$
by, say, a constant $c$, we again do the multiplication component by
component:

\[ c\mathbf{A} =
\left(
  \begin{array}{ccc}
    ca_{11} & \cdots & ca_{1V} \\
    \vdots & \ddots & \vdots \\
    ca_{U1} & \cdots & ca_{UV} \\
  \end{array}
\right) = c\left(
  \begin{array}{ccc}
    a_{11} & \cdots & a_{1V} \\
    \vdots & \ddots & \vdots \\
    a_{U1} & \cdots & a_{UV} \\
  \end{array}
\right) \]

\noindent Thus, if one wished to define the difference of two
matrices, we could proceed rather obviously as follows:

\[ \mathbf{A} - \mathbf{B} \equiv \mathbf{A} + (-1)\mathbf{B} =
\{a_{uv} - b_{uv}\} \]


One of the more important matrix operations is multiplication where
two matrices are said to be conformable for multiplication if the
number of rows in one matches the number of columns in the second.
For example, suppose $\mathbf{A}$ is $U \times V$ and $\mathbf{B}$
is $V \times W$; then, because the number of columns in $\mathbf{A}$
matches the number of rows in $\mathbf{B}$, we can define
$\mathbf{AB}$ as $\mathbf{C}_{U \times W}$, where $\{c_{uw}\}$ =
$\{\sum_{k=1}^{V} a_{uk}b_{kw} \}$.  This process might be referred
to as row (of $\mathbf{A}$) by column (of $\mathbf{B}$)
multiplication; the following simple example should make this clear:

 \[ \mathbf{A}_{3 \times 2} =
\left(
  \begin{array}{rr}
    1 & 4 \\
    3 & 1 \\
    -1 & 0 \\
  \end{array}
\right) , \; \mathbf{B}_{2 \times 4} = \left(
            \begin{array}{rrrr}
              -1 & 2 & 0 & 1 \\
              1 & 0 & 1 & 4 \\
            \end{array}
          \right) ; \]

\[ \mathbf{A}\mathbf{B} = \mathbf{C}_{3 \times 4} = \] \[
\left(
  \begin{array}{rrrr}
    1(-1) + 4(1) & 1(2) + 4(0) & 1(0) + 4(1) & 1(1) + 4(4) \\
    3(-1) + 1(1) & 3(2) + 1(0) & 3(0) + 1(1) & 3(1) + 1(4) \\
    -1(-1) + 0(1) & -1(2) + 0(0) & -1(0) + 0(1) & -1(1) + 0(4) \\
  \end{array}
\right) = \]

\[ \left(
            \begin{array}{rrrr}
              3 & 2 & 4 & 17 \\
              -2 & 6 & 1 & 7 \\
              1 & -2 & 0 & -1 \\
            \end{array}
          \right) \]


Some properties of matrix addition and multiplication follow, where
the matrices are assumed conformable for the operations given:

\smallskip

(A) matrix addition is commutative:

 \[ \mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A} \]

(B) matrix addition is associative:

\[ \mathbf{A} + (\mathbf{B} + \mathbf{C}) = (\mathbf{A} +
\mathbf{B}) + \mathbf{C} \]

(C) matrix multiplication is right and left distributive over matrix
addition:

\[
\mathbf{A}(\mathbf{B} +
 \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{A}\mathbf{C} \]

\[(\mathbf{A} + \mathbf{B})\mathbf{C} = \mathbf{A}\mathbf{C} +
\mathbf{B}\mathbf{C} \]

(D) matrix multiplication is associative:

\[\mathbf{A}(\mathbf{B}\mathbf{C}) =
(\mathbf{A}\mathbf{B})\mathbf{C} \]

\noindent In general, $\mathbf{A}\mathbf{B} \ne
\mathbf{B}\mathbf{A}$ even if both products are defined.  Thus,
multiplication is not commutative as the following simple example
shows:

\[ \mathbf{A}_{2 \times 2} =
\left(
  \begin{array}{cc}
    0 & 1 \\
    1 & 0 \\
  \end{array}
\right); \; \mathbf{B}_{2 \times 2} = \left(
             \begin{array}{cc}
               1 & 1 \\
               0 & 1 \\
             \end{array}
           \right); \; \mathbf{A}\mathbf{B} = \left(
                                               \begin{array}{cc}
                                                 0 & 1 \\
                                                 1 & 1 \\
                                               \end{array}
                                             \right); \;
                                             \mathbf{B}\mathbf{A} = \left(
                                                                      \begin{array}{cc}
                                                                        1 & 1 \\
                                                                        1 & 0 \\
                                                                      \end{array}
                                                                    \right)
                                                                    \]

In the product $\mathbf{A}\mathbf{B}$, we say that $\mathbf{B}$ is
\emph{premultiplied} by $\mathbf{A}$ and $\mathbf{A}$ is
\emph{postmultiplied} by $\mathbf{B}$.  Thus, if we pre- or
postmultiply a matrix by the identity, the same matrix is retrieved:

\[ \mathbf{I}_{U \times U}\mathbf{A}_{U \times V} = \mathbf{A}_{U \times
V}; \; \mathbf{A}_{U \times V}\mathbf{I}_{V \times V} =
\mathbf{A}_{U \times V} \]

\noindent If we premultiply $\mathbf{A}$ by a diagonal matrix
$\mathbf{D}$, then each row of $\mathbf{A}$ is multiplied by a
particular diagonal entry in $\mathbf{D}$:

\[ \mathbf{D}_{U \times U}\mathbf{A}_{U \times V} = \left(
                                                      \begin{array}{ccc}
                                                        d_{1}a_{11} & \cdots & d_{1}a_{1V} \\
                                                        \vdots & \ddots & \vdots \\
                                                        d_{U}a_{U1} & \cdots & d_{U}a_{UV} \\
                                                      \end{array}
                                                    \right) \]


\noindent If $\mathbf{A}$ is post-multiplied by a diagonal matrix
$\mathbf{D}$, then each column of $\mathbf{A}$ is multiplied by a
particular diagonal entry in $\mathbf{D}$:

\[ \mathbf{A}_{U \times V}\mathbf{D}_{V \times V} = \left(
                                                      \begin{array}{ccc}
                                                        d_{1}a_{11} & \cdots & d_{V}a_{1V} \\
                                                        \vdots & \ddots & \vdots \\
                                                        d_{1}a_{U1} & \cdots & d_{V}a_{UV} \\
                                                      \end{array}
                                                    \right) \]

\noindent Finally, we end this section with a few useful results on
the transpose operation and matrix multiplication and addition:

\[ (\mathbf{A}\mathbf{B})' = \mathbf{B}'\mathbf{A}'; \; (\mathbf{A}\mathbf{B}\mathbf{C})' =
\mathbf{C}'\mathbf{B}'\mathbf{A}'; \; \dots \]

\[ (\mathbf{A}')' = \mathbf{A}; \; (\mathbf{A} + \mathbf{B})' =
\mathbf{A}' + \mathbf{B}' \]

\subsection{The Data Matrix}

A very common type of matrix encountered in multivariate analysis is
what is referred to as a data matrix containing, say, observations
for $N$ subjects on $P$ variables.  We will typically denote this
matrix by $\mathbf{X}_{N \times P}$ = $\{x_{ij}\}$, with a generic
element of $x_{ij}$ referring to the observation for subject or row
$i$ on variable or column $j$ ($1 \le i \le N$ and $1 \le j \le P$):

\[ \mathbf{X}_{N \times P} =
\left(
  \begin{array}{cccc}
    x_{11} & x_{12} & \cdots & x_{1P} \\
    x_{21} & x_{22} & \cdots & x_{2P} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{N1} & x_{N2} & \cdots & x_{NP} \\
  \end{array}
\right) \]

\noindent All right-thinking people always list subjects as rows and
variables as columns, conforming also to the now-common convention
for computer spreadsheets.

\smallskip

Any matrix in general, including a data matrix, can be viewed either
as a collection of its row vectors or of its column vectors, and
these interpretations can be generally useful.  For a data matrix
$\mathbf{X}_{N \times P}$, let $\mathbf{x}_{i}' = (x_{i1}, \ldots,
x_{iP})_{1 \times P}$ denote the row vector for subject $i$, $1 \le
i \le N$, and let $\mathbf{v}_{j}$ denote the $N \times 1$ column
vector for variable $j$:

\[ \mathbf{v}_{j} = \left(
                                   \begin{array}{c}
                                     x_{1j} \\
                                     \vdots \\
                                     x_{Nj} \\
                                   \end{array}
                                 \right)_{N \times 1} \]

 \noindent Thus, each subject could be viewed as providing a vector of coordinates ($1 \times P$) in $P$-dimensional
  ``variable space,'' where the $P$ axes correspond to the $P$
  variables; or each variable could be viewed as providing a vector
  of coordinates ($N \times 1$) in ``subject space,'' where the $N$
  axes correspond to the $N$ subjects:

  \[
  \mathbf{X}_{N \times P} = \left(
                              \begin{array}{c}
                                \mathbf{x}_{1}' \\
                                \mathbf{x}_{2}' \\
                                \vdots \\
                                \mathbf{x}_{N}' \\
                              \end{array}
                            \right) = \left(
                                        \begin{array}{cccc}
                                          \mathbf{v}_{1} & \mathbf{v}_{2} & \cdots & \mathbf{v}_{P} \\
                                        \end{array}
                                      \right) \]

\subsection{Inner Products}

The \emph{inner product} (also called the dot or scalar product) of
two vectors , $\mathbf{x}_{U \times 1}$ and $\mathbf{y}_{U \times
1}$, is defined as

\[ \mathbf{x}'\mathbf{y} = (x_{1}, \ldots, x_{U})\left(
                                                   \begin{array}{c}
                                                     y_{1} \\
                                                     \vdots \\
                                                     y_{U} \\
                                                   \end{array}
                                                 \right) = \sum_{u = 1}^{U} x_{u}y_{u} \]

\noindent Thus, the inner product of a vector with itself is merely
the sum of squares of the entries in the vector:
$\mathbf{x}'\mathbf{x} = \sum_{u = 1}^{U} x_{u}^{2}$.  Also, because
an inner product is a scalar and must equal it own transpose (i.e.,
$\mathbf{x}'\mathbf{y} = (\mathbf{x}'\mathbf{y})' =
\mathbf{y}'\mathbf{x}$), we have the end result that

\[ \mathbf{x}'\mathbf{y}
=   \mathbf{y}'\mathbf{x} \]

\noindent If there is an inner product, there should also be an
\emph{outer product}  defined as the $U \times U$ matrices given by
$\mathbf{x}\mathbf{y}'$ or as $\mathbf{y}\mathbf{x}'$.  As indicated
by the display equations below, $\mathbf{x}\mathbf{y}'$ is the
transpose of $\mathbf{y}\mathbf{x}'$:

\[ \mathbf{x}\mathbf{y}' = \left(
                             \begin{array}{c}
                               x_{1} \\
                               \vdots \\
                               x_{U} \\
                             \end{array}
                           \right)(y_{1}, \dots, y_{N}) = \left(
                                                            \begin{array}{ccc}
                                                              x_{1}y_{1} & \cdots & x_{1}y_{U} \\
                                                              \vdots & \dots & \vdots \\
                                                              x_{U}y_{1} & \cdots & x_{U}y_{U} \\
                                                            \end{array}
                                                          \right) \]
\[ \mathbf{y}\mathbf{x}' = \left(
                             \begin{array}{c}
                               y_{1} \\
                               \vdots \\
                               y_{U} \\
                             \end{array}
                           \right)(x_{1}, \dots, x_{U}) = \left(
                                                            \begin{array}{ccc}
                                                              y_{1}x_{1} & \cdots & y_{1}x_{U} \\
                                                              \vdots & \dots & \vdots \\
                                                              y_{U}x_{1} & \cdots & y_{U}x_{U} \\
                                                            \end{array}
                                                          \right) \]


A vector can be viewed as a geometrical vector in $U$ dimensional
space.  Thus, the two $2 \times 1$ vectors

\[ \mathbf{x} = \left(
                  \begin{array}{c}
                    3 \\
                    4 \\
                  \end{array}
                \right) ; \; \mathbf{y} = \left(
                                           \begin{array}{c}
                                             4 \\
                                             1 \\
                                           \end{array}
                                         \right) \]
\noindent can be represented in the two-dimensional Figure 1
below, with the entries in the vectors defining the coordinates of
the endpoints of the arrows.

\smallskip

\begin{figure}

\setlength{\unitlength}{1cm}

\begin{picture}(9,9)

\put(0,0){\vector(1,0){6}}
 \put(0,0){\vector(0,1){6}}

\put(0,0){\vector(3,4){3}}
 \put(0,0){\vector(4,1){4}}

 \put(3,4.1){\makebox(0,0)[br]{(3,4)}}

 \put(4,1.1){\makebox(0,0)[br]{(4,1)}}

\put(-.2,0){\makebox(0,0)[tr]{(0,0)}}

\put(3,2){\makebox(0,0)[b]{$\theta$}}

\put(0,0){\arc(3,.75){39}}

\put(1,-.1){\line(0,1){.2}}

\put(2,-.1){\line(0,1){.2}}

\put(3,-.1){\line(0,1){.2}}

\put(4,-.1){\line(0,1){.2}}

\put(-.1,1){\line(1,0){.2}}

\put(-.1,2){\line(1,0){.2}}

\put(-.1,3){\line(1,0){.2}}

\put(-.1,4){\line(1,0){.2}}


\put(1,-.2){\makebox(0,0)[t]{1}}

\put(2,-.2){\makebox(0,0)[t]{2}}

\put(3,-.2){\makebox(0,0)[t]{3}}

\put(4,-.2){\makebox(0,0)[t]{4}}

\put(-.2,1){\makebox(0,0)[r]{1}}

\put(-.2,2){\makebox(0,0)[r]{2}}

\put(-.2,3){\makebox(0,0)[r]{3}}

\put(-.2,4){\makebox(0,0)[r]{4}}


\end{picture}

\caption{Two vectors plotted in two-dimensional space}

\end{figure}

The \emph{Euclidean distance} between two vectors, $\mathbf{x}$ and
$\mathbf{y}$, is given as:

\[ \sqrt{\sum_{u=1}^{U} (x_{u} - y_{u})^{2}} = \sqrt{(\mathbf{x} - \mathbf{y})'(\mathbf{x} -
\mathbf{y})} \]

\noindent and the \emph{length} of any vector is the Euclidean
distance between the vector and the origin.  Thus, in Figure 1,
the distance between $\mathbf{x}$ and $\mathbf{y}$ is $\sqrt{10}$
with respective lengths of $5$ and $\sqrt{17}$.

\smallskip

The cosine of the angle between the two vectors $\mathbf{x}$ and
$\mathbf{y}$ is defined by:

\[ \cos(\theta) =
\frac{\mathbf{x}'\mathbf{y}}{(\mathbf{x}'\mathbf{x})^{1/2}(\mathbf{y}'\mathbf{y})^{1/2}}
\]

\noindent Thus, in the figure we have

\[\cos(\theta) = \frac{\left(
                         \begin{array}{cc}
                           3 & 4 \\
                         \end{array}
                       \right)
\left(
                                                                   \begin{array}{c}
                                                                     4 \\
                                                                     1 \\
                                                                   \end{array}
                                                                 \right)}{5\sqrt{17}}
                                                                 =
                                                                 \frac{16}{5\sqrt{17}}
                                                                 =
                                                                 .776
                                                                 \]

\noindent The cosine value of $.776$ corresponds to an angle of
$39.1$ degrees or $.68$ radians; these later values can be found
with the inverse (or arc) cosine function (on, say, a hand
calculator, or using MATLAB as we suggest in the next section).

\smallskip

When the means of the entries in $\mathbf{x}$ and $\mathbf{y}$ are
zero (i.e., deviations from means have been taken), then
$\cos(\theta)$ is the correlation between the entries in the two
vectors.  Vectors at right angles have $\cos(\theta) = 0$, or
alternatively, the correlation is zero.


\begin{figure}

\setlength{\unitlength}{1cm}

\begin{picture}(9,9)(-2,-2)

\put(0,0){\vector(1,1){4}}

\put(0,0){\vector(1,0){6}}

\put(0,0){\vector(1,0){4}}

\put(4,-1){\vector(0,1){1}}

\put(4,0){\line(0,1){4}}

\put(2,2.1){\makebox(0,0)[r]{$c^2$}}

\put(4.1,2){\makebox(0,0)[l]{$b^2$}}

\put(2,-.1){\makebox(0,0)[t]{$a^2$}}

\put(0,0){\arc(2,0){45}}


\put(4,4.1){\makebox(0,0)[b]{$\mathbf{x}$}}

\put(6.1,0){\makebox(0,0)[l]{$\mathbf{y}$}}

\put(4,-1){\makebox(0,0)[t]{$d\mathbf{y}$}}

\put(2,1){\makebox(0,0){$\theta$}}


\end{picture}

\caption{Illustration of projecting one vector onto another}

\end{figure}

\smallskip

Figure 2 shows two generic vectors, $\mathbf{x}$ and $\mathbf{y}$,
where without loss of any real generality, $\mathbf{y}$ is drawn
horizontally in the plane and $\mathbf{x}$ is projected at a right
angle onto the vector $\mathbf{y}$ resulting in a point defined as a
multiple $d$ of the vector $\mathbf{y}$.  The formula for $d$ that
we demonstrate below is based on the Pythagorean theorem that $c^2 =
b^2 + a^2$:

\[ c^2 =
b^2 + a^2 \Rightarrow \mathbf{x}'\mathbf{x} = (\mathbf{x} -
d\mathbf{y})'(\mathbf{x} - d\mathbf{y}) + d^2\mathbf{y}'\mathbf{y}
\Rightarrow \]

\[ \mathbf{x}'\mathbf{x} = \mathbf{x}'\mathbf{x} -
d\mathbf{x}'\mathbf{y} - d\mathbf{y}'\mathbf{x} +
d^2\mathbf{y}'\mathbf{y} + d^2\mathbf{y}'\mathbf{y} \Rightarrow \]

\[0 = -2d\mathbf{x}'\mathbf{y} + 2d^2\mathbf{y}'\mathbf{y}
\Rightarrow \]

\[ d = \frac{\mathbf{x}'\mathbf{y}}{\mathbf{y}'\mathbf{y}} \]

\noindent The diagram in Figure 2 is somewhat constricted in the
sense that the angle between the vectors shown is less than $90$
degrees; this allows the constant $d$ to be positive. Other angles
might lead to negative $d$ when defining the projection of
$\mathbf{x}$ onto $\mathbf{y}$, and would merely indicate the need
to consider the vector $\mathbf{y}$ oriented in the opposite
(negative) direction.  Similarly, the vector $\mathbf{y}$ is drawn
with a larger length than $\mathbf{x}$ which gives a value for $d$
that is less than $1.0$; otherwise, $d$ would be greater than $1.0$, indicating a need to stretch $\mathbf{y}$ to represent the point of
projection onto it.

\smallskip

There are other formulas possible based on this geometric
information: the length of the projection is merely $d$ times the
length of $\mathbf{y}$; and $\cos(\theta)$ can be given as the
length of $d\mathbf{y}$ divided by the length of $\mathbf{x}$, which
is $d\sqrt{\mathbf{y}'\mathbf{y}}/\sqrt{\mathbf{x}'\mathbf{x}} =
\mathbf{x}'\mathbf{y}/(\sqrt{\mathbf{x}'\mathbf{x}}\sqrt{\mathbf{y}'\mathbf{y}})$.


\subsection{Determinants}

To each square matrix, $\mathbf{A}_{U \times U}$, there is an
associated scalar called the \emph{determinant} of $\mathbf{A}$ that
is denoted by $|\mathbf{A}|$ or $\det(\mathbf{A})$.  Determinants up
to a $3 \times 3$ can be given by formula:

\[ \det(\left(
     \begin{array}{c}
       a \\
     \end{array}
   \right)_{1 \times 1}) = a; \; \det(\left(
                     \begin{array}{cc}
                       a & b \\
                       c & d \\
                     \end{array}
                   \right)_{2 \times 2}) = ad - bc; \] \[ \det(\left(
                                           \begin{array}{ccc}
                                             a & b & c \\
                                             d & e & f \\
                                             g & h & i \\
                                           \end{array}
                                         \right)_{3 \times 3}) = aei + dhc + gfb
                                         -(ceg + fha + idb) \]

\noindent Beyond a $3 \times 3$ we can use a recursive process
illustrated below.  This requires the introduction of a few
additional matrix terms that we now give: for a square matrix
$\mathbf{A}_{U \times U}$, define $\mathbf{A}_{uv}$ to be the $(n-1)
\times (n-1)$ submatrix of $\mathbf{A}$ constructed by deleting the
$u^{th}$ row and $v^{th}$ column of $\mathbf{A}$. We call
$\det(\mathbf{A}_{uv})$ the \emph{minor} of the entry $a_{uv}$; the
signed minor of $(-1)^{u+v}\det(\mathbf{A}_{uv})$ is called the
\emph{cofactor} of $a_{uv}$.  The recursive algorithm would chose
some row or column (rather arbitrarily), and find the cofactors for
the entries in it; the cofactors would then be weighted by the
relevant entries and summed.

\smallskip

As an example, consider the $4 \times 4$ matrix

\[ \left(
     \begin{array}{rrrr}
       1 & -1 & 3 & 1 \\
       -1 & 1 & 0 & -1 \\
       3 & 2 & 1 & 2 \\
       1 & 2 & 4 & 3 \\
     \end{array}
   \right) \]

\noindent and choose the second row.  The expression below involves
the weighted cofactors for $3 \times 3$ submatrices that can be
obtained by formulas.  Beyond a $4 \times 4$ there will be nesting
of the processes:

\[ (-1)((-1)^{2+1})\det(\left(
                        \begin{array}{rrr}
                          -1 & 3 & 1 \\
                          2 & 1 & 2 \\
                          2 & 4 & 3 \\
                        \end{array}
                      \right)) +
(1)((-1)^{2+2})\det(\left(
                        \begin{array}{rrr}
                          1 & 3 & 1 \\
                          3 & 1 & 2 \\
                          1 & 4 & 3 \\
                        \end{array}
                      \right)) + \] \[
(0)((-1)^{2+3})\det(\left(
                        \begin{array}{rrr}
                          1 & -1 & 1 \\
                          3 & 2 & 2 \\
                          1 & 2 & 3 \\
                        \end{array}
                      \right)) +
(-1)((-1)^{2+4})\det(\left(
                        \begin{array}{rrr}
                          1 & -1 & 3 \\
                          3 & 2 & 1 \\
                          1 & 2 & 4 \\
                        \end{array}
                      \right))  = \] \[ 5 + (-15) + 0 + (-29) = -39 \]

Another strategy to find the determinant of a matrix is to reduce it
a form in which we might note the determinant more or less by simple
inspection. The reductions could be carried out by operations that
have a known effect on the determinant; the form which we might seek
is a matrix that is either \emph{upper-triangular} (all entries
below the main diagonal are all zero), \emph{lower-triangular} (all
entries above the main diagonal are all zero), or diagonal.  In
these latter cases, the determinant is merely the product of the
diagonal elements.  Once found, we can note how the determinant
might have been changed by the reduction process and carry out the
reverse changes to find the desired determinant.

\smallskip

The properties of determinants that we could rely on in the above
iterative process are as follows:
\smallskip

\noindent (A) if \emph{one} row of $\mathbf{A}$ is multiplied by a
constant $c$, the new determinant is $c\det(\mathbf{A})$; the same
is true for multiplying a column by $c$;
\smallskip

\noindent (B) if two rows or two columns of a matrix are
interchanged, the sign of the determinant is changed;
\smallskip

\noindent (C) if two rows or two columns of a matrix are equal, the
determinant is zero;
\smallskip

\noindent (D) the determinant is unchanged by adding a multiple of
some row to another row; the same is true for columns;
\smallskip

\noindent (E) a zero row or column implies a zero determinant;
\smallskip

\noindent (F) $\det(\mathbf{A}\mathbf{B}) =
\det(\mathbf{A})\det(\mathbf{B})$

\subsection{Linear Independence/Dependence of Vectors}

Suppose I have a collection of $K$ vectors each of size $U \times
1$, $\mathbf{x}_{1},\ldots,\mathbf{x}_{K}$.  If no vector in the set
can be written as a linear combination of the remaining ones, the
set of vectors is said to be \emph{linearly independent}; otherwise,
the vectors are \emph{linearly dependent}.  As an example, consider
the three vectors:

\[ \mathbf{x}_{1} = \left(
     \begin{array}{r}
       1 \\
       4 \\
       0 \\
     \end{array}
   \right); \;
   \mathbf{x}_{2} = \left(
     \begin{array}{r}
       1 \\
       -1 \\
       1 \\
     \end{array}
   \right); \;
   \mathbf{x}_{3} = \left(
     \begin{array}{r}
       3 \\
       7 \\
       1 \\
     \end{array}
   \right) \]

   \noindent Because $2\mathbf{x}_{1} + \mathbf{x}_{2} =
   \mathbf{x}_{3}$, we have a linear dependence among the three
 vectors; however, $\mathbf{x}_{1}$ and $\mathbf{x}_{2}$, or, $\mathbf{x}_{2}$ and
 $\mathbf{x}_{3}$, are linearly independent.

 \smallskip

 If the $U$ vectors (each of size $U \times 1$), $\mathbf{x}_{1}, \mathbf{x}_{2}, \ldots,
 \mathbf{x}_{U}$, are linearly independent, then the collection
 defines a \emph{basis}, i.e., any vector can be written as a linear
 combination of $\mathbf{x}_{1}, \mathbf{x}_{2}, \ldots,
 \mathbf{x}_{U}$.  For example, using the \emph{standard basis}, $\mathbf{e}_{1}, \mathbf{e}_{2}, \ldots,
 \mathbf{e}_{U}$, where $\mathbf{e}_{u}$ is a vector of all zeros
 except for a single one in the $u^{th}$ position, any vector
 $\mathbf{x}' = (x_{1},\ldots,x_{U})$ can be written as:

 \[ \left(
      \begin{array}{c}
        x_{1} \\
        x_{2} \\
        \vdots \\
        x_{U} \\
      \end{array}
    \right) = x_{1}\left(
                     \begin{array}{c}
                       1 \\
                       0 \\
                       \vdots \\
                       0 \\
                     \end{array}
                   \right) + x_{2}\left(
                                    \begin{array}{c}
                                      0 \\
                                      1 \\
                                      \vdots \\
                                      0 \\
                                    \end{array}
                                  \right) + \cdots + x_{U}\left(
                                                            \begin{array}{c}
                                                              0 \\
                                                              0 \\
                                                              \vdots \\
                                                              1 \\
                                                            \end{array}
                                                          \right) =
                                                          \] \[
                                                          x_{1}\mathbf{e}_{1}
                                                          + x_{2}\mathbf{e}_{2}
                                                          + \cdots +
                                                          x_{U}\mathbf{e}_{U}
                                                          \]

\noindent Bases that consist of \emph{orthogonal} vectors (where all
inner products are zero) are important later in what is known as
principal components analysis.  The standard basis involves
orthogonal vectors, and any other basis may always be modified by
what is called the Gram-Schmidt orthogonalization process to produce
a new basis that does contain all orthogonal vectors.


\subsection{Matrix Inverses}

Suppose $\mathbf{A}$ and $\mathbf{B}$ are both square and of size $U
\times U$.  If $\mathbf{A}\mathbf{B} = \mathbf{I}$, then
$\mathbf{B}$ is said to be an inverse of $\mathbf{A}$ and is denoted
by $\mathbf{A}^{-1} (\equiv \mathbf{B})$.  Also, if
$\mathbf{A}\mathbf{A}^{-1} = \mathbf{I}$, then
$\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$ holds automatically. If
$\mathbf{A}^{-1}$ exists, the matrix $\mathbf{A}$ is said to be
\emph{nonsingular}; if $\mathbf{A}^{-1}$ does not exist,
$\mathbf{A}$ is \emph{singular}.

\smallskip

An example:

\[ \left(
     \begin{array}{rr}
       1 & 3 \\
       2 & 1 \\
     \end{array}
   \right)
   \left(
     \begin{array}{rr}
       -1/5 & 3/5 \\
       2/5 & -1/5 \\
     \end{array}
   \right) =
   \left(
     \begin{array}{rr}
       1 & 0 \\
       0 & 1 \\
     \end{array}
   \right) \]

   \[ \left(
     \begin{array}{rr}
       -1/5 & 3/5 \\
       2/5 & -1/5 \\
     \end{array}
   \right)
   \left(
     \begin{array}{rr}
       1 & 3 \\
       2 & 1 \\
     \end{array}
   \right) =
   \left(
     \begin{array}{rr}
       1 & 0 \\
       0 & 1 \\
     \end{array}
   \right) \]

   Given a matrix $\mathbf{A}$, the inverse $\mathbf{A}^{-1}$ can be
   found using the following four steps:

   \smallskip

   (A) form a matrix of the same size as $\mathbf{A}$ containing the
   minors for all entries of $\mathbf{A}$;

   \smallskip

   (B) multiply the matrix of minors by $(-1)^{u+v}$ to produce the
   matrix of cofactors;

   \smallskip

   (C) divide all entries in the cofactors matrix by
   $\det(\mathbf{A}$);

   \smallskip

   (D) the transpose of the matrix found in (C) gives
   $\mathbf{A}^{-1}$.

\smallskip

   \noindent As a mnemonic device to remember these four steps, we have the
   phrase ``My Cat Does Tricks'' for Minor, Cofactor, Determinant
   Division, Transpose'' (I tried to work ``my cat turns tricks''
   into the appropriate phrase but failed with the second to the last
   ``t'').  Obviously, an inverse exists for a matrix $\mathbf{A}$
   if $\det(\mathbf{A}) \ne 0$, allowing the division in step (C) to
   take place.

\smallskip

   An example:   for

   \[ \mathbf{A} = \left(
        \begin{array}{rrr}
          1 & 3 & 2 \\
          0 & 1 & 1 \\
          0 & 2 & 1 \\
        \end{array}
      \right); \; \det(\mathbf{A}) = -1 \]

      Step (A), the matrix of minors:

      \[ \left(
           \begin{array}{rrr}
             -1 & 0 & 0 \\
             -1 & 1 & 2 \\
             1 & 1 & 1 \\
           \end{array}
         \right) \]

         Step (B), the matrix of cofactors:

        \[ \left(
           \begin{array}{rrr}
             -1 & 0 & 0 \\
             1 & 1 & -2 \\
             1 & -1 & 1 \\
           \end{array}
         \right) \]

         Step (C), determinant division:

         \[ \left(
           \begin{array}{rrr}
             1 & 0 & 0 \\
             -1 & -1 & 2 \\
             -1 & 1 & -1 \\
           \end{array}
         \right) \]

         Step (D), matrix transpose:

         \[ \mathbf{A}^{-1} = \left(
           \begin{array}{rrr}
             1 & -1 & -1 \\
             0 & -1 & 1 \\
             0 & 2 & -1 \\
           \end{array}
         \right) \]

         We can easily verify that $\mathbf{A}\mathbf{A}^{-1} =
         \mathbf{I}$:

         \[\left(
             \begin{array}{rrr}
               1 & 3 & 2 \\
               0 & 1 & 1 \\
               0 & 2 & 1 \\
             \end{array}
           \right)
           \left(
             \begin{array}{rrr}
               1 & -1 & -1 \\
               0 & -1 & 1 \\
               0 & 2 & -1 \\
             \end{array}
           \right) =
\left(
  \begin{array}{rrr}
    1 & 0 & 0 \\
    0 & 1 & 0 \\
    0 & 0 & 1 \\
  \end{array}
\right) \]

\noindent As a very simple instance of the mnemonic in the case of a
$2 \times 2$ matrix with arbitrary entries:

\[ \mathbf{A} =  \left(
                \begin{array}{rr}
                  a & b \\
                  c & d \\
                \end{array}
              \right) \]

      \noindent the inverse exists if $\det(\mathbf{A}) = ad - bc \ne
      0$:


\[ \mathbf{A}^{-1} = \frac{1}{ad - bc}\left(
                \begin{array}{rr}
                  d & -b \\
                  -c & a \\
                \end{array}
              \right) \]

              Several properties of inverses are given below that
              will prove useful in our continuing presentation:

              \smallskip

              (A) if $\mathbf{A}$ is symmetric, then so is
              $\mathbf{A}^{-1}$;

              \smallskip

              (B) $(\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'$; or, the
              inverse of a transpose is the transpose of the
              inverse;

              \smallskip

              (C) $(\mathbf{A}\mathbf{B})^{-1} =
              \mathbf{B}^{-1}\mathbf{A}^{-1}$; $(\mathbf{A}\mathbf{B}\mathbf{C})^{-1} =
              \mathbf{C}^{-1}\mathbf{B}^{-1}\mathbf{A}^{-1}$; or,
              the inverse of a product is the product of inverses in
              the opposite order;

              \smallskip

              (D) $(c\mathbf{A})^{-1} = (\frac{1}{c})\mathbf{A}^{-1}$;
              or, the inverse of a scalar times a matrix is the
              scalar inverse times the matrix inverse;

              \smallskip

              (E) the inverse of a diagonal matrix, is also diagonal with
              the entries being the inverses of the
              entries from the original matrix (assuming none are
              zero):

              \[ \left(
                   \begin{array}{ccc}
                     a_{1} & \cdots & 0 \\
                     \vdots & \ddots & \vdots \\
                     0 & \cdots & a_{U} \\
                   \end{array}
                 \right)^{-1} =
                 \left(
                   \begin{array}{ccc}
                     \frac{1}{a_{1}} & \cdots & 0 \\
                     \vdots & \ddots & \vdots \\
                     0 & \cdots & \frac{1}{a_{U}} \\
                   \end{array}
                 \right) \]


\subsection{Matrices as Transformations}

Any $U \times V$ matrix $\mathbf{A}$ can be seen as transforming a
$V \times 1$ vector $\mathbf{x}_{V \times 1}$ to another $U \times
1$ vector $\mathbf{y}_{U \times 1}$:

\[ \mathbf{y}_{U \times 1} = \mathbf{A}_{U \times V}\mathbf{x}_{V
\times 1}\]

\noindent or,

\[ \left(
     \begin{array}{c}
       y_{1} \\
       \vdots \\
       y_{U} \\
     \end{array}
   \right) =
   \left(
     \begin{array}{ccc}
       a_{11} & \cdots & a_{1V} \\
       \vdots & \ddots & \vdots \\
       a_{U1} &  \cdots & a_{UV} \\
     \end{array}
   \right)
   \left(
     \begin{array}{c}
       x_{1} \\
       \vdots \\
       x_{V} \\
     \end{array}
   \right) \]

   \noindent where $y_{u} = a_{u1}x_{1} + a_{u2}x_{2} + \cdots +
   a_{uV}x_{V}$.  Alternatively, $\mathbf{y}$ can be written as a
   linear combination of the columns of $\mathbf{A}$ with weights
   given by $x_{1},\ldots,x_{V}$:

   \[ \left(
        \begin{array}{c}
          y_{1} \\
          \vdots \\
          y_{U} \\
        \end{array}
      \right) = x_{1}\left(
                       \begin{array}{c}
                         a_{11} \\
                         \vdots \\
                         a_{U1} \\
                       \end{array}
                     \right) + x_{2}\left(
                       \begin{array}{c}
                         a_{12} \\
                         \vdots \\
                         a_{U2} \\
                       \end{array}
                     \right) + \cdots + x_{V}\left(
                       \begin{array}{c}
                         a_{1V} \\
                         \vdots \\
                         a_{UV} \\
                       \end{array}
                     \right) \]

To indicate one common usage for matrix transformation in a data
context, suppose we consider our data matrix $\mathbf{X} =
\{x_{ij}\}_{N \times P}$, where $x_{ij}$ represents an observation
for subject $i$ on variable $j$.  We would like to use matrix
transformations to produce a standardized matrix $\mathbf{Z} =
\{(x_{ij} - \bar{x_{j}})/s_{j}\}_{N \times P}$, where $\bar{x_{j}}$
is the mean of the entries in the $j^{th}$ column and $s_{j}$ is the
corresponding standard deviation; thus, the columns of $\mathbf{Z}$
all have mean zero and standard deviation one.  A matrix expression
for this transformation could be written as follows:

\[ \mathbf{Z}_{N \times P} = (\mathbf{I}_{N \times N} -
(\frac{1}{N})\mathbf{E}_{N \times N})\mathbf{X}_{N \times
P}\mathbf{D}_{P \times P} \]

\noindent where $\mathbf{I}$ is the identity matrix, $\mathbf{E}$
contains all ones, and $\mathbf{D}$ is a diagonal matrix containing
$\frac{1}{s_{1}}, \frac{1}{s_{2}}, \ldots, \frac{1}{s_{P}}$, along
the main diagonal positions.  Thus,  $(\mathbf{I}_{N \times N} -
(\frac{1}{N})\mathbf{E}_{N \times N})\mathbf{X}_{N \times P}$
produces a matrix with columns deviated from the column means; a
postmultiplication by $\mathbf{D}$ carries out the within column
division by the standard deviations.  Finally, if we define the
expression $(\frac{1}{N})(\mathbf{Z}'\mathbf{Z})_{P \times P} \equiv
\mathbf{R}_{P \times P}$, we have the familiar correlation
coefficient matrix among the $P$ variables.


\subsection{Matrix and Vector Orthogonality}

Two vectors, $\mathbf{x}$ and $\mathbf{y}$, are said to be
\emph{orthogonal} if $\mathbf{x}'\mathbf{y} = 0$, and would lie at
right angles when graphed.  If, in addition, $\mathbf{x}$ and
$\mathbf{y}$ are both of unit length (i.e.,
$\sqrt{\mathbf{x}'\mathbf{x}} = \sqrt{\mathbf{y}'\mathbf{y}} = 1$),
then they are said to be \emph{orthonormal}.  A square matrix
$\mathbf{T}_{U \times U}$ is said to be \emph{orthogonal} if its
rows form a set of mutually orthonormal vectors.  An example (called
a Helmert matrix of order 3) follows:

\[ \mathbf{T} = \left(
     \begin{array}{rrr}
       1/\sqrt{3} & 1/\sqrt{3} & 1/\sqrt{3} \\
       1/\sqrt{2} & -1/\sqrt{2} & 0 \\
       -1/\sqrt{6} & -1/\sqrt{6} & 2/\sqrt{6} \\
     \end{array}
   \right) \]

   There are several nice properties of orthogonal matrices that we
   will see again in our various discussions to follow:

   \smallskip

   (A) $\mathbf{T}\mathbf{T}' = \mathbf{T}'\mathbf{T} = \mathbf{I}$; thus, the inverse of $\mathbf{T}$ is $\mathbf{T}'$;

   \smallskip

   (B) the columns of $\mathbf{T}$ are orthonormal;

   \smallskip

   (C) $\det(\mathbf{T}) = \pm 1$;

   \smallskip

   (D) if $\mathbf{T}$ and $\mathbf{R}$ are orthogonal, then so is
   $\mathbf{T}\mathbf{R}$;

   \smallskip

   (E) vectors lengths do not change under an orthogonal
   transformation: to see this, let $\mathbf{y} = \mathbf{T}\mathbf{x}$; then


   \[ \mathbf{y}'\mathbf{y} =
   (\mathbf{T}\mathbf{x})'(\mathbf{T}\mathbf{x}) =
   \mathbf{x}'\mathbf{T}'\mathbf{T}\mathbf{x} =
   \mathbf{x}'\mathbf{I}\mathbf{x} = \mathbf{x}'\mathbf{x} \]

\subsection{Matrix Rank}

An arbitrary matrix, $\mathbf{A}$, of order $U \times V$ can be
written either in terms of its $U$ rows, say, $\mathbf{r}'_{1},
\mathbf{r}'_{2}, \ldots, \mathbf{r}'_{U}$ or its $V$ columns,
$\mathbf{c}_{1}, \mathbf{c}_{2}, \ldots, \mathbf{c}_{V}$, where

\[ \mathbf{r}'_{u} = \left(
                       \begin{array}{ccc}
                         a_{u1} & \cdots & a_{uV} \\
                       \end{array}
                     \right); \; \mathbf{c}_{v} = \left(
                                                     \begin{array}{c}
                                                       a_{1v} \\
                                                       \vdots \\
                                                       a_{Uv} \\
                                                     \end{array}
                                                   \right) \]

\noindent and

\[ \mathbf{A}_{U \times V} = \left(
                               \begin{array}{c}
                                 \mathbf{r}'_{1} \\
                                  \mathbf{r}'_{2} \\
                                 \vdots \\
                                 \mathbf{r}'_{U} \\
                               \end{array}
                             \right) =
                             \left(
                               \begin{array}{cccc}
                                  \mathbf{c}_{1} &  \mathbf{c}_{2} & \cdots &  \mathbf{c}_{V} \\
                               \end{array}
                             \right) \]

\noindent The maximum number of linearly independent rows of
$\mathbf{A}$ and the maximum number of linearly independent columns
is the same; this common number is defined to be the \emph{rank} of
$\mathbf{A}$.
 A matrix is said to be of \emph{full rank} is the rank is equal
to the minimum of $U$ and $V$.

\smallskip

Matrix rank has a number of useful properties:

\smallskip

(A) $\mathbf{A}$ and $\mathbf{A}'$ have the same rank;

\smallskip

(B) $\mathbf{A}'\mathbf{A}$, $\mathbf{A}\mathbf{A}'$, and
$\mathbf{A}$ have the same rank;

\smallskip

(C) the rank of a matrix is unchanged by a pre- or
postmultiplication by a nonsingular matrix;

\smallskip

(D) the rank of a matrix is unchanged by what are called elementary
row and column operations: (a) interchange of two rows or two
columns; (2) multiplication or a row or a column by a scalar; (3)
addition of a row (or column) to another row (or column). This is
true because any elementary operation can be represented by a
premultiplication (if the operation is to be on rows) or a
postmultiplication (if the operation is to be on columns) of a
nonsingular matrix.

\smallskip

To give a simple example, suppose we wish to perform some elementary
row and column operations on the matrix

\[ \left(
     \begin{array}{ccc}
       1 & 1 & 1 \\
       1 & 0 & 2 \\
       3 & 2 & 4 \\
     \end{array}
   \right) \]

   \noindent To interchange the first two rows of this latter matrix,
   interchange the first two rows of an identity matrix and
   premultiply; for the first two columns to be interchanged, carry
   out the operation on the identity and post-multiply:

   \[ \left(
        \begin{array}{ccc}
          0 & 1 & 0 \\
          1 & 0 & 0 \\
          0 & 0 & 1 \\
        \end{array}
      \right)
         \left(
     \begin{array}{ccc}
       1 & 1 & 1 \\
       1 & 0 & 2 \\
       3 & 2 & 4 \\
     \end{array}
   \right) =
   \left(
     \begin{array}{ccc}
       1 & 0 & 2 \\
       1 & 1 & 1 \\
       3 & 2 & 4 \\
     \end{array}
   \right) \]

   \[ \left(
        \begin{array}{ccc}
          1 & 1 & 1 \\
          1 & 0 & 2 \\
          3 & 2 & 4 \\
        \end{array}
      \right)
\left(
     \begin{array}{ccc}
       0 & 1 & 0 \\
       1 & 0 & 0 \\
       0 & 0 & 1 \\
     \end{array}
   \right) =
   \left(
     \begin{array}{ccc}
       1 & 1 & 1 \\
       0 & 1 & 2 \\
       2 & 3 & 4 \\
     \end{array}
   \right) \]

   \noindent To multiply a row of our example matrix (e.g., the second row by $5$), multiply the
   desired row of an identity matrix and
   premultiply; for multiplying a specific column (e.g., the second column by $5$), carry
   out the operation of the identity and post-multiply:

    \[ \left(
        \begin{array}{ccc}
          1 & 0 & 0 \\
          0 & 5 & 0 \\
          0 & 0 & 1 \\
        \end{array}
      \right)
\left(
     \begin{array}{ccc}
       1 & 1 & 1 \\
       1 & 0 & 2 \\
       3 & 2 & 4 \\
     \end{array}
   \right) =
   \left(
     \begin{array}{ccc}
       1 & 1 & 1 \\
       5 & 0 & 10 \\
       3 & 2 & 4 \\
     \end{array}
   \right) \]

   \[ \left(
        \begin{array}{ccc}
          1 & 1 & 1 \\
          1 & 0 & 2 \\
          3 & 2 & 4 \\
        \end{array}
      \right)
\left(
     \begin{array}{ccc}
       1 & 0 & 0 \\
       0 & 5 & 0 \\
       0 & 0 & 1 \\
     \end{array}
   \right) =
   \left(
     \begin{array}{ccc}
       1 & 5 & 1 \\
       1 & 0 & 2 \\
       3 & 10 & 4 \\
     \end{array}
   \right) \]

   \noindent To add one row to a second (e.g., the first row to the
   second), carry out the operation on the identity and
   premultiply; to add one column to a second (e.g., the first column
   to the second), carry out the operation of the identity and post-multiply:

 \[ \left(
        \begin{array}{ccc}
          1 & 0 & 0 \\
          1 & 1 & 0 \\
          0 & 0 & 1 \\
        \end{array}
      \right)
\left(
     \begin{array}{ccc}
       1 & 1 & 1 \\
       1 & 0 & 2 \\
       3 & 2 & 4 \\
     \end{array}
   \right) =
   \left(
     \begin{array}{ccc}
       1 & 1 & 1 \\
       2 & 1 & 3 \\
       3 & 2 & 4 \\
     \end{array}
   \right) \]

   \[ \left(
        \begin{array}{ccc}
          1 & 1 & 1 \\
          1 & 0 & 2 \\
          3 & 2 & 4 \\
        \end{array}
      \right)
\left(
     \begin{array}{ccc}
       1 & 0 & 0 \\
       1 & 1 & 0 \\
       0 & 0 & 1 \\
     \end{array}
   \right) =
   \left(
     \begin{array}{ccc}
       1 & 2 & 1 \\
       1 & 1 & 2 \\
       3 & 5 & 4 \\
     \end{array}
   \right) \]

In general, by performing elementary row and column operations, any
$U \times V$ matrix can be reduced to a \emph{canonical form}:

\[\left(
    \begin{array}{cccccc}
      1 & \cdots & 0 & 0 & \cdots & 0 \\
      \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\
      0 & \cdots & 1 & 0 & \cdots & 0 \\
      0 & \cdots & 0 & 0 & \cdots & 0 \\
      \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\
      0 & \cdots & 0 & 0 & \cdots & 0 \\
    \end{array}
  \right) \]

  \noindent The rank of a matrix can then be found by counting the
  number of ones in the above matrix.

  \smallskip

  Given a $U \times V$ matrix, $\mathbf{A}$, there exist $s$ nonsingular
  elementary row operation matrices,
  $\mathbf{R}_{1},\ldots,\mathbf{R}_{s}$, and $t$ nonsingular
  elementary column operation matrices, $\mathbf{C}_{1},\ldots,\mathbf{C}_{t}$
  such that $\mathbf{R}_{s} \cdots \mathbf{R}_{1}\mathbf{A}\mathbf{C}_{1} \cdots
  \mathbf{C}_{t}$ is in canonical form.  Moreover, if $\mathbf{A}$
  is square ($U \times U$) and of full rank (i.e.,
  $\det(\mathbf{A}) \ne 0$), then there are $s$ nonsingular
  elementary row operation matrices,
    $\mathbf{R}_{1},\ldots,\mathbf{R}_{s}$, and $t$ nonsingular
  elementary column operation matrices,
  $\mathbf{C}_{1},\ldots,\mathbf{C}_{t}$,
  such that $\mathbf{R}_{s} \cdots \mathbf{R}_{1}\mathbf{A} = \mathbf{I}$ or
  $\mathbf{A}\mathbf{C}_{1} \cdots \mathbf{C}_{t} = \mathbf{I}$.  Thus,
  $\mathbf{A}^{-1}$ can be found either as $ \mathbf{R}_{s} \cdots
  \mathbf{R}_{1}$ or as $\mathbf{C}_{1} \cdots \mathbf{C}_{t}$.  In
  fact, a common way in which an inverse is calculated ``by hand''
  starts with both  $\mathbf{A}$ and $\mathbf{I}$ on the same
  sheet of paper; when reducing $\mathbf{A}$ step-by-step, the same
  operations are then applied to $\mathbf{I}$, building up the inverse
  until the canonical form is reached in the reduction of $\mathbf{A}$.


\subsection{Using Matrices to Solve Equations}

Suppose we have a set of $U$ equations in $V$ unknowns:

\[ \begin{array}{ccccccc}
     a_{11}x_{1} & + & \cdots & + & a_{1V}x_{1}  & = & c_{1} \\
     \vdots & \vdots & \cdots & \vdots & \vdots & \vdots & \vdots \\
      a_{U1}x_{1} & + & \cdots & + & a_{UV}x_{V}  & = & c_{U} \\
      \end{array}
       \]

\noindent If we let

\[ \mathbf{A} = \left(
                  \begin{array}{ccc}
                    a_{11} & \cdots & a_{1V} \\
                    \vdots & \ddots & \vdots \\
                    a_{U1} & \cdots & a_{UV} \\
                  \end{array}
                \right); \;
       \mathbf{x} = \left(
  \begin{array}{c}
    x_{1} \\
    \vdots \\
    x_{V} \\
  \end{array}
\right); \; \mathbf{c} = \left(
  \begin{array}{c}
    c_{1} \\
    \vdots \\
    c_{U} \\
  \end{array}
\right)
      \]

      \noindent then the equations can be written as follows:
      $\mathbf{A}_{U \times V}\mathbf{x}_{V \times 1} = \mathbf{c}_{U \times 1}$.  In the simplest
      instance, $\mathbf{A}$ is square and nonsingular, implying
      that a solution may be given simply as $\mathbf{x} =
      \mathbf{A}^{-1}\mathbf{c}$.  If there are fewer (say, $S < V$  linearly independent) equations than
      unknowns (so, $S$ is the rank of $\mathbf{A}$), then we can solve for $S$ unknowns in terms of the
      constants $c_{1},\ldots,c_{U}$ and the remaining $V - S$
      unknowns.  We will see how this works in our discussion of obtaining
      eigenvectors that correspond to certain eigenvalues in a
      section to follow.  Generally, the set of equations is said to
      be \emph{consistent} if a solution exists, i.e., a linear
      combination of the column vectors of $\mathbf{A}$ can be used
      to define $\mathbf{c}$:

      \[ x_{1}\left(
                \begin{array}{c}
                  a_{11} \\
                  \vdots \\
                  a_{U1} \\
                \end{array}
              \right) + \cdots + x_{V}\left(
                                        \begin{array}{c}
                                          a_{1V} \\
                                          \vdots \\
                                          a_{UV} \\
                                        \end{array}
                                      \right) = \left(
                                                  \begin{array}{c}
                                                    c_{1} \\
                                                    \vdots \\
                                                    c_{U} \\
                                                  \end{array}
                                                \right) \]


\noindent or the augmented matrix $(\mathbf{A} \; \mathbf{c})$ has
the same rank as $\mathbf{A}$; otherwise no solution exists and the
system of equations is said to be \emph{inconsistent}.


\subsection{Quadratic Forms}

Suppose $\mathbf{A}_{U \times U}$ is symmetric and let $\mathbf{x}'
= (x_{1}, \dots, x_{U})$.  A \emph{quadratic form} is defined by

\[ \mathbf{x}'\mathbf{A}\mathbf{x} = \sum_{u=1}^{U}\sum_{v=1}^{U}
a_{uv}x_{u}x_{v} = \] \[ a_{11}x_{1}^{2} + a_{22}x_{2}^{2} + \cdots
+ a_{UU}x_{U}^{2} + 2a_{12}x_{1}x_{2} + \cdots + 2a_{1U}x_{1}x_{U} +
\cdots + 2a_{(U-1)U}x_{U-1}x_{U} \]

\noindent For example, $\sum_{u=1}^{U}(x_{u} - \bar{x})^{2}$, where
$\bar{x}$ is the mean of the entries in $\mathbf{x}$, is a quadratic
form because it can be written as

\[ \left(
     \begin{array}{c}
       x_{1} \\
       x_{2} \\
       \vdots \\
       x_{U} \\
     \end{array}
   \right)'
   \left(
     \begin{array}{cccc}
       (U-1)/U & -1/U & \cdots & -1/U \\
       -1/U & (U-1)/U & \cdots & -1/U \\
       \vdots & \vdots & \ddots & \vdots \\
       -1/U & -1/U & \cdots & (U-1)/U \\
     \end{array}
   \right)
   \left(
     \begin{array}{c}
       x_{1} \\
       x_{2}\\
       \vdots \\
       x_{U} \\
     \end{array}
   \right) \]

   \noindent Due to the ubiquity of sum-of-squares in
   statistics, it should be of no surprise that quadratic forms play a
   central role in multivariate analysis.

   \smallskip

   A symmetric matrix $\mathbf{A}$ (and associated quadratic form)
   are called \emph{positive definite} (p.d.) if
   $\mathbf{x}'\mathbf{A}\mathbf{x} > 0$ for all $\mathbf{x} \ne
   \mathbf{0}$ (the zero vector); if $\mathbf{x}'\mathbf{A}\mathbf{x} \ge 0$ for all
   $\mathbf{x}$, then $\mathbf{A}$ is \emph{positive semi-definite} (p.s.d).
    We could have negative definite, negative semi-definite, and
    indefinite forms as well.  Note that a correlation or
    covariance matrix is at least positive semi-definite, and
    satisfies the stronger condition of being
    positive definite if  the vectors of the variables on
    which the correlation or covariance matrix is based, are linearly
    independent.

\subsection{Multiple Regression}

One of the most common topics in any beginning statistics class is
\emph{multiple regression} that we now formulate (in matrix terms)
as the relation between a dependent random variable $Y$ and a
collection of $K$ independent variables, $X_{1}, X_{2}, \ldots,
X_{K}$.  Suppose we have $N$ subjects on which we observe $Y$, and
arrange these values into an $N \times 1$ vector:

\[ \mathbf{Y} = \left(
\begin{array}{c}
  Y_{1} \\
  Y_{2} \\
  \vdots \\
  Y_{N}\\
\end{array}
\right) \]

\noindent The observations on the $K$ independent variables are also
placed in vectors:

\[ \mathbf{X}_{1} = \left(
\begin{array}{c}
  X_{11} \\
  X_{21} \\
  \vdots \\
  X_{N1} \\
\end{array}
\right) ; \; \mathbf{X}_{2} = \left(
\begin{array}{c}
  X_{12} \\
  X_{22} \\
  \vdots \\
  X_{N2} \\
\end{array}
\right) ; \; \ldots ; \; \mathbf{X}_{K} = \left(
\begin{array}{c}
  X_{1K} \\
  X_{2K} \\
  \vdots \\
  X_{NK} \\
\end{array}
\right) \]

\noindent It would be simple if the vector $\mathbf{Y}$ were
linearly dependent on $\mathbf{X}_{1}, \mathbf{X}_{2}, \ldots,
\mathbf{X}_{K}$ because then

\[ \mathbf{Y} = b_{1}\mathbf{X}_{1} + b_{2}\mathbf{X}_{2} + \cdots +
b_{K}\mathbf{X}_{K} \]

\noindent for some values $b_{1}, \ldots, b_{K}$.  We could always
write for \emph{any} values of $b_{1}, \ldots, b_{K}$:

\[ \mathbf{Y} = b_{1}\mathbf{X}_{1} + b_{2}\mathbf{X}_{2} + \cdots +
b_{K}\mathbf{X}_{K} + \mathbf{e} \]

\noindent where \[ \mathbf{e} = \left(
\begin{array}{c}
  e_{1} \\
  \vdots \\
  e_{N} \\
\end{array}
\right) \]

\noindent is an error vector.  To formulate our task as an
optimization problem (least-squares), we wish to find a good set of
weights, $b_{1}, \ldots, b_{K}$, so the length of $\mathbf{e}$ is
minimized, i.e., $\mathbf{e}'\mathbf{e}$ is made as small as
possible.

\smallskip

As notation, let

\[ \mathbf{Y}_{N \times 1} = \mathbf{X}_{N \times K}\mathbf{b}_{K
\times 1} + \mathbf{e}_{N \times 1} \]

\noindent where

\[ \mathbf{X} =\left(
\begin{array}{ccc}
  \mathbf{X}_{1}  &  \ldots  & \mathbf{X}_{K} \\
\end{array}
\right); \; \mathbf{b} = \left(
\begin{array}{c}
  b_{1} \\
  \vdots \\
  b_{K} \\
\end{array}
\right) \]

\noindent To minimize $\mathbf{e}'\mathbf{e} = (\mathbf{Y} -
\mathbf{X}\mathbf{b})'(\mathbf{Y} - \mathbf{X}\mathbf{b})$, we use
the vector $\mathbf{b}$ that satisfies what are called the normal
equations:

\[  \mathbf{X}'\mathbf{X}\mathbf{b} = \mathbf{X}'\mathbf{Y} \]

\noindent If $\mathbf{X}'\mathbf{X}$ is nonsingular (i.e.,
$\det(\mathbf{X}'\mathbf{X}) \ne 0$; or equivalently,
$\mathbf{X}_{1} , \ldots , \mathbf{X}_{K}$ are linearly
independent), then

\[ \mathbf{b} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y} \]

\noindent The vector that is ``closest'' to $\mathbf{Y}$ in our
least-squares sense, is $\mathbf{X}\mathbf{b}$; this is a linear
combination of the columns of $\mathbf{X}$ (or in other jargon,
$\mathbf{X}\mathbf{b}$ defines the \emph{projection} of $\mathbf{Y}$
into the space defined by (all linear combinations of) the columns
of $\mathbf{X}$.

\smallskip

In statistical uses of multiple regression, the estimated
variance-covariance matrix of the regression coefficients, $b_{1},
\ldots, b_{K}$,  is given as
$(\frac{1}{N-K})\mathbf{e}'\mathbf{e}(\mathbf{X}'\mathbf{X})^{-1}$,
where $(\frac{1}{N-K})\mathbf{e}'\mathbf{e}$ is an (unbiased)
estimate of the error variance for the distribution from which the
errors are assumed drawn.  Also, in multiple regression instances
that usually involve an additive constant, the latter is obtained
from a weight attached to an independent variable  defined to be
identically one.

\smallskip

In multivariate multiple regression where there are, say, $T$
dependent variables (each represented by an $N \times 1$ vector),
the dependent vectors are merely concatenated together into an $N
\times T$ matrix, $\mathbf{Y}_{N \times T}$; the solution to the
normal equations now produces a matrix $\mathbf{B}_{K \times T} =
(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}$ of regression
coefficients.  In effect, this general expression just uses each of
the dependent variables separately and adjoins all the results.

\section{Eigenvectors and Eigenvalues}

Suppose we are given a square matrix, $\mathbf{A}_{U \times U}$, and
consider the polynomial $\det(\mathbf{A} - \lambda\mathbf{I})$ in
the unknown value $\lambda$, referred to as Laplace's expansion:

\[ \det(\mathbf{A} - \lambda\mathbf{I}) = (-\lambda)^{U} +
S_{1}(-\lambda)^{U-1} + \cdots + S_{U-1}(-\lambda)^{-1} +
S_{U}(-\lambda)^{0} \]

\noindent where $S_{u}$ is the sum of all $u \times u$ principal
minor determinants. A \emph{principal} minor determinant is obtained
from a submatrix formed from $\mathbf{A}$ that has $u$ diagonal
elements left in it.  Thus, $S_{1}$ is the trace of $\mathbf{A}$ and
$S_{U}$ is the determinant.

\smallskip

There are $U$  roots, $\lambda_{1}, \ldots, \lambda_{U}$, of the
equation $\det(\mathbf{A} - \lambda\mathbf{I}) = 0$, given that the
left-hand-side is a $U^{th}$ degree polynomial.  The roots are
called the \emph{eigenvalues} of $\mathbf{A}$.   There are a number
 of properties of eigenvalues that prove generally
useful:

\smallskip

(A) $\det{\mathbf{A}} = \prod_{u=1}^{U} \lambda_{u}$;
trace($\mathbf{A}) = \sum_{u=1}^{U} \lambda_{u}$;

\smallskip

(B) if $\mathbf{A}$ is symmetric with real elements, then all
$\lambda_{u}$ are real;

\smallskip

(C) if $\mathbf{A}$ is positive definite, then all $\lambda_{u}$ are
positive (strictly greater than zero); if $\mathbf{A}$ is positive
semi-definite, then all $\lambda_{u}$ are nonnegative (greater than
or equal to zero);

\smallskip

(D) if $\mathbf{A}$ is symmetric and positive semi-definite with
rank $R$, then there are $R$ positive roots and $U-R$ zero roots;

\smallskip

(E) the nonzero roots of $\mathbf{A}\mathbf{B}$ are equal to those
of $\mathbf{B}\mathbf{A}$; thus, the trace of $\mathbf{A}\mathbf{B}$
is equal to the trace of $\mathbf{B}\mathbf{A}$;

\smallskip

(F) eigenvalues of a diagonal matrix are the diagonal elements
themselves;

\smallskip

(G) for any $U \times V$ matrix $\mathbf{B}$, the ranks of
$\mathbf{B}$, $\mathbf{B}'\mathbf{B}$, and $\mathbf{B}\mathbf{B}'$
are all the same.  Thus, because $\mathbf{B}'\mathbf{B}$ (and
$\mathbf{B}\mathbf{B}'$) are symmetric and positive semi-definite
(i.e., $\mathbf{x}'(\mathbf{B}'\mathbf{B})\mathbf{x} \ge 0$ because
$(\mathbf{B}\mathbf{x})'(\mathbf{B}\mathbf{x})$ is a sum-of-squares
which is always nonnegative), we can use (D) to find the rank of
$\mathbf{B}$ by counting the positive roots of
$\mathbf{B}'\mathbf{B}$.

\smallskip

We carry through a small example below:

\[ \mathbf{A} = \left(
\begin{array}{ccc}
  7 & 0 & 1 \\
  0 & 7 & 2 \\
  1 & 2 & 3 \\
\end{array}
\right) \]

\[ S_{1} = \mathrm{trace}(\mathbf{A}) = 17 \] \[ S_{2} = \det(\left(
\begin{array}{cc}
  7 & 0 \\
  0 & 7 \\
\end{array}
\right)) + \det(\left(
\begin{array}{cc}
  7 & 1 \\
  1 & 3 \\
\end{array}
\right)) + \det(\left(
\begin{array}{cc}
  7 & 2 \\
  2 & 3 \\
\end{array}
\right)) = 49 + 20 + 17 = 86 \] \[ S_{3} = \det(\mathbf{A}) = 147 +
0 + 0 - 7 - 28 - 0 = 112
\]

\noindent Thus,

\[ \det(\mathbf{A} - \lambda\mathbf{I}) = (-\lambda)^{3} +
17(-\lambda)^{2} + 86(-\lambda)^{1} + 112 = \] \[ -\lambda^{3} +
17\lambda^{2} - 86\lambda + 112 = -(\lambda -2)(\lambda -
8)(\lambda-7) = 0 \]

\noindent which gives roots of 2, 8, and 7.

\smallskip

If $\lambda_{u}$ is an eigenvalue of $\mathbf{A}$, then the
equations $[\mathbf{A} - \lambda_{u}\mathbf{I}]\mathbf{x}_{u} = 0$
have a nontrivial solution (i.e., the determinant of $\mathbf{A} -
\lambda_{u}\mathbf{I}$ vanishes, and so the inverse of $\mathbf{A} -
\lambda_{u}\mathbf{I}$ does not exist).  The solution is called an
\emph{eigenvector} (associated with the corresponding eigenvalue),
and can be characterized by the following condition:

\[ \mathbf{A}\mathbf{x}_{u} = \lambda_{u}\mathbf{x}_{u} \]

\noindent An eigenvector is determined up to a scale factor only, so
typically we normalize to unit length (which then gives a $\pm$
option for the two possible unit length solutions).

\smallskip

We continue our simple example and find the corresponding
eigenvalues: when $\lambda = 2$, we have the equations (for
$[\mathbf{A} - \lambda\mathbf{I}]\mathbf{x} = \mathbf{0}$)

\[ \left(
\begin{array}{ccc}
  5 & 0 & 1 \\
  0 & 5 & 2 \\
  1 & 2 &  1 \\
\end{array}
\right) \left(
\begin{array}{c}
  x_{1} \\
  x_{2} \\
  x_{3} \\
\end{array}
\right) =
\left(%
\begin{array}{c}
  0 \\
  0 \\
  0 \\
\end{array}%
\right) \]

\noindent with an arbitrary solution of

\[ \left(%
\begin{array}{r}
  -\frac{1}{5}a \\
  -\frac{2}{5}a \\
  a \\
\end{array}
\right) \]

\noindent Choosing $a$ to be $+\frac{5}{\sqrt{30}}$ to obtain one of
the two possible normalized solutions, we have as our final
eigenvector for $\lambda = 2$:

\[ \left(
\begin{array}{r}
  -\frac{1}{\sqrt{30}} \\
  -\frac{2}{\sqrt{30}} \\
  \frac{5}{\sqrt{30}} \\
\end{array}
\right) \]

\noindent For $\lambda = 7$ we will use the normalized eigenvector
of

\[\left(
\begin{array}{r}
  -\frac{2}{\sqrt{5}} \\
  \frac{1}{\sqrt{5}} \\
  0 \\
\end{array}
\right) \]

\noindent and for $\lambda = 8$,

\[ \left(
\begin{array}{r}
  \frac{1}{\sqrt{6}} \\
  \frac{2}{\sqrt{6}} \\
  \frac{1}{\sqrt{6}} \\
\end{array}
\right) \]

One of the interesting properties of eigenvalues/eigenvectors for a
symmetric matrix $\mathbf{A}$ is that if $\lambda_{u}$ and
$\lambda_{v}$ are distinct eigenvalues, then the corresponding
eigenvectors, $\mathbf{x}_{u}$ and $\mathbf{x}_{v}$, are orthogonal
(i.e., $\mathbf{x}_{u}'\mathbf{x}_{v} = 0$).  We can show this in
the following way: the defining conditions of

\[ \mathbf{A}\mathbf{x}_{u} = \lambda_{u}\mathbf{x}_{u} \]

\[ \mathbf{A}\mathbf{x}_{v} = \lambda_{v}\mathbf{x}_{v} \]


\noindent lead to

\[ \mathbf{x}_{v}'\mathbf{A}\mathbf{x}_{u} =
\mathbf{x}_{v}'\lambda_{u}\mathbf{x}_{u}
\]

\[ \mathbf{x}_{u}'\mathbf{A}\mathbf{x}_{v} =
\mathbf{x}_{u}'\lambda_{v}\mathbf{x}_{v}
\]

\noindent Because $\mathbf{A}$ is symmetric and the left-hand-sides
of these two expressions are equal (they are one-by-one matrices and
equal to their own transposes), the right-hand-sides must also be
equal.  Thus,

\[ \mathbf{x}_{v}'\lambda_{u}\mathbf{x}_{u} =
\mathbf{x}_{u}'\lambda_{v}\mathbf{x}_{v} \Rightarrow \]

\[ \mathbf{x}_{v}'\mathbf{x}_{u}\lambda_{u} =
\mathbf{x}_{u}'\mathbf{x}_{v}\lambda_{v} \]

\noindent Due to the equality of $\mathbf{x}_{v}'\mathbf{x}_{u}$ and
$\mathbf{x}_{u}'\mathbf{x}_{v}$, and by assumption, $\lambda_{u} \ne
\lambda_{v}$, the inner product $\mathbf{x}_{v}'\mathbf{x}_{u}$ must
be zero for the last displayed equality to hold.

\smallskip

In summary of the above discussion, for every real symmetric matrix
$\mathbf{A}_{U \times U}$, there exists an orthogonal matrix
$\mathbf{P}$ (i.e., $\mathbf{P}'\mathbf{P} = \mathbf{P}\mathbf{P}' =
\mathbf{I}$) such that $\mathbf{P}'\mathbf{A}\mathbf{P} =
\mathbf{D}$, where $\mathbf{D}$ is a diagonal matrix containing the
eigenvalues of $\mathbf{A}$, and

\[ \mathbf{P} = \left(
\begin{array}{ccc}
  \mathbf{p}_{1} & \ldots & \mathbf{p}_{U} \\
\end{array}
\right) \]

\noindent where $\mathbf{p}_{u}$ is a normalized eigenvector
associated with $\lambda_{u}$ for $1 \le u \le U$.  If the
eigenvalues are not distinct, it is still possible to choose the
eigenvectors to be orthogonal. Finally, because $\mathbf{P}$ is an
orthogonal matrix (and $\mathbf{P}'\mathbf{A}\mathbf{P} = \mathbf{D}
\Rightarrow
\mathbf{P}\mathbf{P}'\mathbf{A}\mathbf{}\mathbf{P}\mathbf{P}' =
\mathbf{P}\mathbf{D}\mathbf{P}'$), we can finally represent
$\mathbf{A}$ as

\[ \mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}' \]

In terms of the small numerical example being used, we have for
$\mathbf{P}'\mathbf{A}\mathbf{P} = \mathbf{D}$:

\[ \left(
\begin{array}{rrr}
  -\frac{1}{\sqrt{30}} & -\frac{2}{\sqrt{30}} & \frac{5}{\sqrt{30}} \\
  -\frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} & 0 \\
  \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{6}} \\
\end{array}
\right)
 \left(
\begin{array}{rrr}
  7 & 0 & 1 \\
  0 & 7 & 2 \\
  1 & 2 & 3 \\
\end{array}
\right)
 \left(
\begin{array}{rrr}
  -\frac{1}{\sqrt{30}} & -\frac{2}{\sqrt{5}} & \frac{1}{\sqrt{6}} \\
  -\frac{2}{\sqrt{30}} & \frac{1}{\sqrt{5}} & \frac{2}{\sqrt{6}} \\
  \frac{5}{\sqrt{30}} & 0 & \frac{1}{\sqrt{6}} \\
\end{array} \right) = \] \[
\left(
\begin{array}{rrr}
  2 & 0 & 0 \\
  0 & 7 & 0 \\
  0 & 0 & 8 \\
\end{array} \right)  \]


\noindent and for $\mathbf{P}\mathbf{D}\mathbf{P}' = \mathbf{A}$:

\[ \left(
\begin{array}{rrr}
  -\frac{1}{\sqrt{30}} & -\frac{2}{\sqrt{5}} & \frac{1}{\sqrt{6}} \\
  -\frac{2}{\sqrt{30}} & \frac{1}{\sqrt{5}} & \frac{2}{\sqrt{6}} \\
  \frac{5}{\sqrt{30}} & 0 & \frac{1}{\sqrt{6}} \\
\end{array}
\right)
 \left(
\begin{array}{rrr}
  2 & 0 & 0 \\
  0 & 7 & 0 \\
  0 & 0 & 8 \\
\end{array}
\right)
 \left(
\begin{array}{rrr}
  -\frac{1}{\sqrt{30}} & -\frac{2}{\sqrt{30}} & \frac{5}{\sqrt{30}} \\
  -\frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} & 0 \\
  \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{6}} \\
\end{array} \right) = \] \[
\left(
\begin{array}{rrr}
  7 & 0 & 1 \\
  0 & 7 & 2 \\
  1 & 2 & 3 \\
\end{array} \right)  \]


The representation of $\mathbf{A}$ as
$\mathbf{P}\mathbf{D}\mathbf{P}'$ leads to several rather nice
computational ``tricks.''  First, if $\mathbf{A}$ is p.s.d., we can
define

\[ \mathbf{D}^{1/2} \equiv \left(
                       \begin{array}{ccc}
                         \sqrt{\lambda_{1}} & \ldots & 0 \\
                         \vdots & \ddots & \vdots \\
                         0 & \ldots &   \sqrt{\lambda_{U}} \\
                       \end{array}
                     \right) \]

                     \noindent and
represent $\mathbf{A}$ as

\[ \mathbf{A} =
\mathbf{P}\mathbf{D}^{1/2}\mathbf{D}^{1/2}\mathbf{P}' =
\mathbf{P}\mathbf{D}^{1/2}(\mathbf{P}\mathbf{D}^{1/2})' =
\mathbf{L}\mathbf{L}',
 \mathrm{say}. \]

\noindent In other words, we have ``factored'' $\mathbf{A}$ into
$\mathbf{L}\mathbf{L}'$, for
\[ \mathbf{L} =
\mathbf{P}\mathbf{D}^{1/2} = \left(
                              \begin{array}{cccc}
                                \sqrt{\lambda_{1}}\mathbf{p}_{1} & \sqrt{\lambda_{2}}\mathbf{p}_{2} & \ldots & \sqrt{\lambda_{U}}\mathbf{p}_{U} \\
                              \end{array}
                            \right) \]

                            \noindent Secondly, if $\mathbf{A}$ is p.d., we can
define

\[ \mathbf{D}^{-1} \equiv \left(
                       \begin{array}{ccc}
                         \frac{1}{\lambda_{1}} & \ldots & 0 \\
                         \vdots & \ddots & \vdots \\
                         0 & \ldots &    \frac{1}{\lambda_{U}} \\
                       \end{array}
                     \right) \]

                     \noindent and
represent $\mathbf{A}^{-1}$ as

\[ \mathbf{A}^{-1} =
\mathbf{P}\mathbf{D}^{-1}\mathbf{P}' \]

\noindent To verify,
\[ \mathbf{A}\mathbf{A}^{-1} =
(\mathbf{P}\mathbf{D}\mathbf{P}')(\mathbf{P}\mathbf{D}^{-1}\mathbf{P}')
= \mathbf{I} \] \noindent Thirdly, to define a ``square root''
matrix, let $\mathbf{A}^{1/2} \equiv
\mathbf{P}\mathbf{D}^{1/2}\mathbf{P}'$. To verify,
$\mathbf{A}^{1/2}\mathbf{A}^{1/2} = \mathbf{P}\mathbf{D}\mathbf{P}'
= \mathbf{A}$.

\smallskip

There is a generally interesting way to represent the multiplication
of two matrices considered as collections of column and row vectors,
respectively, where the final answer is a sum of outer products of
vectors.  This view will prove particularly useful in our discussion
of principal component analysis.  Suppose we have two matrices
$\mathbf{B}_{U \times V}$, represented as a collection of its $V$
columns:

\[ \mathbf{B} = \left(
                 \begin{array}{cccc}
                   \mathbf{b}_{1} &  \mathbf{b}_{2} & \ldots &  \mathbf{b}_{V} \\
                 \end{array}
               \right) \]
 \noindent and $\mathbf{C}_{V \times W}$, represented as a collection of its
 $V$ rows:
 \[ \mathbf{C} = \left(
                   \begin{array}{c}
                     \mathbf{c}_{1}' \\
                      \mathbf{c}_{2}' \\
                     \vdots \\
                      \mathbf{c}_{V}' \\
                   \end{array}
                 \right) \]
                 \noindent The product $\mathbf{B}\mathbf{C} =
                 \mathbf{D}$ can  be written as

                 \[  \mathbf{B}\mathbf{C} = \left(
                 \begin{array}{cccc}
                   \mathbf{b}_{1} &  \mathbf{b}_{2} & \ldots &  \mathbf{b}_{V} \\
                 \end{array}
               \right)  \left(
                   \begin{array}{c}
                     \mathbf{c}_{1}' \\
                      \mathbf{c}_{2}' \\
                     \vdots \\
                      \mathbf{c}_{V}' \\
                   \end{array}
                 \right) = \] \[ \mathbf{b}_{1}\mathbf{c}_{1}' +
                 \mathbf{b}_{2}\mathbf{c}_{2}' + \cdots +
                 \mathbf{b}_{V}\mathbf{c}_{V}' = \mathbf{D} \]

                 As an example, consider the \emph{spectral
                 decomposition} of $\mathbf{A}$ considered above as
                 $\mathbf{P}\mathbf{D}\mathbf{P}'$, and where from
                 now on, without loss of any generality, the
                 diagonal entries  in $\mathbf{D}$ are ordered as
                 $\lambda_{1} \ge \lambda_{2} \ge \cdots \ge
                 \lambda_{U}$.  We can represent $\mathbf{A}$ as
                  \[ \mathbf{A}_{U \times U} = \left(
                                                 \begin{array}{ccc}
                                                   \sqrt{\lambda_{1}}\mathbf{p}_{1} & \ldots & \sqrt{\lambda_{U}}\mathbf{p}_{U} \\
                                                 \end{array}
                                               \right)
                                               \left(
                                                 \begin{array}{c}
                                                   \sqrt{\lambda_{1}}\mathbf{p}_{1}'
                                                   \\
                                                   \vdots \\
                                                   \sqrt{\lambda_{U}}\mathbf{p}_{U}'
                                                   \\
                                                 \end{array}
                                               \right) = \] \[
                                               \lambda_{1}\mathbf{p}_{1}\mathbf{p}_{1}'
                                               + \cdots +
                                               \lambda_{U}\mathbf{p}_{U}\mathbf{p}_{U}'
                                               \]

\noindent If $\mathbf{A}$ is p.s.d. and of rank $R$, then the above
sum obviously stops at $R$ components. In general, the matrix
$\mathbf{B}_{U \times U}$ that is a rank $K$ ($ \le R$)
least-squares approximation to $\mathbf{A}$ can be given by \[
\mathbf{B} = \lambda_{1}\mathbf{p}_{1}\mathbf{p}_{1}'
                                               + \cdots +
                                               \lambda_{k}\mathbf{p}_{K}\mathbf{p}_{K}'
                                               \]

                                               \noindent and the value
                                               of the loss function:
                                               \[ \sum_{v=1}^{U}
                                               \sum_{u=1}^{U}
                                               (a_{uv} - b_{uv})^{2}
                                               = \lambda_{K+1}^{2}
                                               + \cdots +
                                               \lambda_{U}^{2}
                                               \]


\section{The Singular Value Decomposition of a Matrix}

The \emph{singular value decomposition} (SVD) or the \emph{basic
structure} of a matrix refers to the representation of \emph{any}
rectangular $U \times V$ matrix, say, $\mathbf{A}$, as a triple
product:

\[ \mathbf{A}_{U \times V} =  \mathbf{P}_{U \times R} \mathbf{\Delta}_{R \times R}
\mathbf{Q}_{R \times V}' \]

\noindent where the $R$ columns of $\mathbf{P}$ are orthonormal; the
$R$ rows of $\mathbf{Q}'$ are orthonormal;  $\mathbf{\Delta}$ is
diagonal with ordered positive entries, $\delta_{1} \ge \delta_{2}
\ge \cdots \ge \delta_{R} > 0$; and $R$ is the rank of $\mathbf{A}$.
Or, alternatively, we can ``fill up'' this decomposition as

\[ \mathbf{A}_{U \times V} =  \mathbf{P}_{U \times U}^{*}
\mathbf{\Delta}_{U \times V}^{*} \mathbf{Q}_{V \times V}^{*'} \]

\noindent where the columns of $\mathbf{P}^{*}$ and rows of
$\mathbf{Q}^{*'}$ are still orthonormal, and the diagonal matrix
$\mathbf{\Delta}$ forms the upper-left-corner of
$\mathbf{\Delta}^{*}$:

\[ \mathbf{\Delta}^{*} = \left(
                        \begin{array}{cc}
                          \mathbf{\Delta} & \mathbf{\emptyset} \\
                          \mathbf{\emptyset} & \mathbf{\emptyset} \\
                        \end{array}
                      \right) \]

                      \noindent here, $\mathbf{\emptyset}$ represents an
                      appropriately dimensioned matrix of all zeros.
                      In analogy to the least-squares result of the
                      last section, if a rank $K$ ($\le R$) matrix
                      approximation to $\mathbf{A}$ is desired, say
                      $\mathbf{B}_{U \times V}$, the first $K$
                      ordered entries in $\mathbf{\Delta}$ are
                      taken:

                       \[ \mathbf{B} =
\delta_{1}\mathbf{p}_{1}\mathbf{q}_{1}'
                                               + \cdots +
                                               \delta_{K}\mathbf{p}_{K}\mathbf{q}_{K}'
                                               \]

                                               \noindent and the value
                                               of the loss function:
                                               \[ \sum_{v=1}^{V}
                                               \sum_{u=1}^{U}
                                               (a_{uv} - b_{uv})^{2}
                                               = \delta_{K+1}^{2}
                                               + \cdots +
                                               \delta_{R}^{2}
                                               \]
                                               \noindent This latter
                                               result of approximating one matrix (least-squares) by another of lower rank, is referred to
                                               as the Eckart-Young
                                               theorem in the
                                               psychometric
                                               literature.

                                               \smallskip

 Once one has the SVD of a matrix, a lot of representation needs can be expressed in terms of it.
 For example,  suppose $\mathbf{A} =  \mathbf{P}\mathbf{\Delta}\mathbf{Q}'$; the spectral decomposition of
$\mathbf{A}\mathbf{A}'$ can then be given as
\[ (\mathbf{P}\mathbf{\Delta}\mathbf{Q}')(
\mathbf{P}\mathbf{\Delta}\mathbf{Q}')' =
\mathbf{P}\mathbf{\Delta}\mathbf{Q}'\mathbf{Q}\mathbf{\Delta}\mathbf{P}'
=  \mathbf{P}\mathbf{\Delta}\mathbf{\Delta}\mathbf{P}' =
\mathbf{P}\mathbf{\Delta}^{2}\mathbf{P}' \]

\noindent Similarly, the spectral decomposition of
$\mathbf{A}'\mathbf{A}$ is expressible as
$\mathbf{Q}\mathbf{\Delta}^{2}\mathbf{Q}'$.


\section{Common Multivariate Methods in Matrix Terms}

In this section we give a very brief overview of some common methods
of multivariate analysis in terms of the matrix ideas we have
introduced thus far in this chapter.  Later chapters (if they ever get written) will come back
to these topics and develop them in more detail.

\subsection{Principal Components}

Suppose we have a data matrix $\mathbf{X}_{N \times P}$ =
$\{x_{ij}\}$, with  $x_{ij}$ referring as usual to the observation
for subject $i$ on variable or column $j$:

\[ \mathbf{X}_{N \times P} =
\left(
  \begin{array}{cccc}
    x_{11} & x_{12} & \cdots & x_{1P} \\
    x_{21} & x_{22} & \cdots & x_{2P} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{N1} & x_{N2} & \cdots & x_{NP} \\
  \end{array}
\right) \]

\noindent The columns can be viewed as containing $N$ observations
on each of $P$ random variables that we denote generically by
$X_{1}, X_{2}, \ldots, X_{P}$.  We let $\mathbf{A}$ denote the $P
\times P$ sample covariance matrix obtained among the variables from
$\mathbf{X}$, and let $\lambda_{1} \ge \cdots \ge \lambda_{P} \ge 0$
be its $P$ eigenvalues and $\mathbf{p}_{1}, \ldots, \mathbf{p}_{P}$
the corresponding normalized eigenvectors.  Then, the linear
combination
\[ \mathbf{p}_{k}'\left(
                   \begin{array}{c}
                     X_{1} \\
                     \vdots \\
                     X_{P} \\
                   \end{array}
                 \right) \]
                 \noindent is called the $k^{th}$ (sample) \emph{principal
                 component}.

                 \smallskip

There are (at least) two interesting properties of principal
components to bring up at this time:

\smallskip

A) The $k^{th}$ principal component has maximum variance among all
linear combinations defined by unit length vectors  orthogonal to
$\mathbf{p}_{1}, \ldots, \mathbf{p}_{k-1}$; also, it is uncorrelated
with the components up to $k-1$;

\smallskip

B) $\mathbf{A} \approx \lambda_{1}\mathbf{p}_{1}\mathbf{p}_{1}' +
\cdots + \lambda_{K}\mathbf{p}_{K}\mathbf{p}_{K}'$ gives a
least-squares rank $K$ approximation to $\mathbf{A}$ (a special case
of the Eckart-Young theorem for an arbitrary symmetric matrix).

\subsection{Discriminant Analysis}

Suppose we have a one-way analysis-of-variance (ANOVA) layout with
$J$ groups ($n_{j}$ subjects in group $j$, $1 \le j \le J$), and $P$
measurements on each subject.  If $x_{ijk}$ denotes person $i$, in
group $j$, and the observation of variable $k$ ($1 \le i \le n_{j}$;
$1 \le j \le J$; $1 \le k \le P$), then define the
Between-Sum-of-Squares matrix

\[\mathbf{B}_{P \times P} =
\{\sum_{j=1}^{J} n_{j}(\bar{x}_{\cdot jk} - \bar{x}_{\cdot \cdot
k})(\bar{x}_{\cdot jk'} - \bar{x}_{\cdot \cdot k'})\}_{P \times P}
\]

\noindent and the Within-Sum-of-Squares matrix

 \[\mathbf{W}_{P
\times P} = \{ \sum_{j=1}^{J} \sum_{i=1}^{n_{j}} (x_{ijk} -
 \bar{x}_{\cdot jk})(x_{ijk'} - \bar{x}_{\cdot jk'}) \}_{P \times P} \]

 For the matrix product $\mathbf{W}^{-1}\mathbf{B}$, let
 $\lambda_{1}, \ldots, \lambda_{T} \ge 0$ be the eigenvectors ($T =
 \min(P, J-1)$, and $\mathbf{p}_{1}, \ldots, \mathbf{p}_{T}$ the
 corresponding normalized eigenvectors.  Then, the linear
 combination

 \[ \mathbf{p}_{k}'\left(
                                 \begin{array}{c}
                                   X_{1} \\
                                   \vdots \\
                                   X_{P} \\
                                 \end{array}
                               \right) \]

                               \noindent is called the
                               $k^{th}$ \emph{discriminant
                               function}.
It has the valuable property of maximizing the univariate $F$-ratio
subject to being uncorrelated with the earlier linear combinations.
A variety of applications of discriminant functions exists in
classification that we will come back to later.  Also, standard
multivariate ANOVA significance testing is based on various
functions of the eigenvalues $\lambda_{1}, \ldots, \lambda_{T}$ and
their derived sampling distributions.

\subsection{Canonical Correlation}

Suppose the collection of $P$ random variables that we have observed
over the $N$ subjects is actually in the form of two ``batteries,''
$X_{1}, \ldots, X_{Q}$ and $X_{Q+1}, \ldots, X_{P}$, and the
observed covariance matrix $\mathbf{A}_{P \times P}$ is partitioned
into four parts:

\[ \mathbf{A}_{P \times P} = \left(
                               \begin{array}{cc}
                                 \mathbf{A}_{11} &  \mathbf{A}_{12} \\
                                  \mathbf{A}_{12}' &  \mathbf{A}_{22} \\
                               \end{array}
                             \right) \]

\noindent where $\mathbf{A}_{11}$ is $Q \times Q$ and represents the
observed covariances among the variables in the first battery;
$\mathbf{A}_{22}$ is $(P-Q) \times (P-Q)$ and represents the
observed covariances among the variables in the second battery;
$\mathbf{A}_{12}$ is $Q \times (P-Q)$ and represents the observed
covariances between the variables in the first and second batteries.
Consider the following two equations in unknown vectors $\mathbf{a}$
and $\mathbf{b}$, and unknown scalar $\lambda$:

\[ \mathbf{A}_{11}^{-1} \mathbf{A}_{12} \mathbf{A}_{22}^{-1}
\mathbf{A}_{12}' \mathbf{a} = \lambda \mathbf{a} \]

\[ \mathbf{A}_{22}^{-1} \mathbf{A}_{12}' \mathbf{A}_{11}^{-1}
\mathbf{A}_{12} \mathbf{b} = \lambda \mathbf{b} \]

\noindent There are $T$ solutions to these expressions (for $T =
\min(Q,(P-Q))$), given by normalized unit-length vectors,
$\mathbf{a}_{1}, \ldots, \mathbf{a}_{T}$ and $\mathbf{b}_{1},
\ldots, \mathbf{b}_{T}$; and a set of common $\lambda_{1} \ge \cdots
\ge \lambda_{T} \ge 0$.

\smallskip

The linear combinations of the first and second batteries defined by
$\mathbf{a}_{k}$ and $\mathbf{b}_{k}$ are the $k^{th}$
\emph{canonical variates} and have squared correlation of
$\lambda_{k}$; they are uncorrelated with all other canonical
variates (defined either in the first or second batteries).  Thus,
$\mathbf{a}_{1}$ and $\mathbf{b}_{1}$ are the first canonical
variates with squared correlation of $\lambda_{1}$; among all linear
combinations defined by unit-length vectors for the variables in the
two batteries, this squared correlation is the highest it can be.
(We note that the coefficient matrices $\mathbf{A}_{11}^{-1}
\mathbf{A}_{12} \mathbf{A}_{22}^{-1} \mathbf{A}_{12}'$ and
$\mathbf{A}_{22}^{-1} \mathbf{A}_{12}' \mathbf{A}_{11}^{-1}
\mathbf{A}_{12}$ are not symmetric; thus, special symmetrizing and
equivalent equation systems are typically used to obtain the
solutions to the original set of expressions.)


\subsection{Algebraic Restrictions on Correlations}

A matrix $\mathbf{A}_{P \times P}$ that represents a covariance
matrix among a collection of random variables, $X_{1}, \ldots,
X_{P}$ is p.s.d.; and conversely, any p.s.d. matrix represents the
covariance matrix for some collection of random variables.  We
partition $\mathbf{A}$ to isolate its last row and column as

\[ \mathbf{A} =  \left(
     \begin{array}{cc}
       \mathbf{B}_{(P-1) \times (P-1)} & \mathbf{g}_{(P-1) \times 1} \\
       \mathbf{g}' & a_{PP} \\
     \end{array}
   \right) \]

   \noindent $\mathbf{B}$ is the $(P-1) \times (P-1)$ covariance
   matrix among the variables $X_{1}, \ldots, X_{P-1}$; $\mathbf{g}$
   is $(P-1) \times 1$ and
   contains the cross-covariance between the the first $P-1$
   variables and the $P^{th}$; $a_{PP}$ is the variance for the
   $P^{th}$ variable.

   \smallskip

   Based on the observation that determinants of p.s.d. matrices are
   nonnegative, and a result on expressing determinants for partitioned
   matrices (that we do not give here), it must be true that

   \[ \mathbf{g}'\mathbf{B}^{-1}\mathbf{g} \le a_{PP} \]

   \noindent or if we think correlations rather than merely
   covariances (so the main diagonal of $\mathbf{A}$ consists of
   all ones):

   \[ \mathbf{g}'\mathbf{B}^{-1}\mathbf{g} \le 1 \]

   \noindent Given the correlation matrix
   $\mathbf{B}$,  the possible values the correlations
   in $\mathbf{g}$ could have are in or on the ellipsoid defined in $P-1$ dimensions by
   $\mathbf{g}'\mathbf{B}^{-1}\mathbf{g} \le 1$. The important point
   is that we do not have a ``box'' in $P-1$ dimensions containing
   the correlations with sides extending the whole range of $\pm 1$;
   instead, some restrictions are placed on the observable
   correlations that gets defined by the size of the correlation in $\mathbf{B}$.  For example, when
   $P
   = 3$, a correlation between variables $X_{1}$ and $X_{2}$ of $r_{12} = 0$ gives the ``degenerate''
   ellipse of a circle for constraining the correlation values between $X_{1}$ and $X_{2}$
   and the third variable $X_{3}$ (in a two-dimensional $r_{13}$ versus $r_{23}$ coordinate system);
    for $r_{12} = 1$,
    the ellipse flattens to a
   line in this same two-dimensional space.

   \smallskip

   Another algebraic restriction that can be seen immediately is
   based on the formula for the partial correlation between
   two variables,  ``holding the third constant'':

   \[ \frac{r_{12} -
   r_{13}r_{23}}{\sqrt{(1-r_{13}^{2})(1-r_{23}^{2})}} \]

   \noindent Bounding the above by $\pm 1$ (because it is a correlation) and ``solving'' for
   $r_{12}$, gives the algebraic  upper and lower bounds of

   \[ r_{12} \le r_{13}r_{23} +  \sqrt{(1-r_{13}^{2})(1-r_{23}^{2})} \]

   \[  r_{13}r_{23} - \sqrt{(1-r_{13}^{2})(1-r_{23}^{2})} \le r_{12} \]


\subsection{The Biplot}

Let $\mathbf{A} = \{a_{ij}\}$ be an $n \times m$ matrix of rank $r$.  We wish to find
a second matrix $\mathbf{B} = \{b_{ij}\}$ of the same size, $n \times m$, but of rank
$t$, where $t \le r$, such that the least squares criterion, $\sum_{i,j} (a_{ij} -
b_{ij})^{2}$, is as small as possible overall all matrices of rank $t$.

\smallskip

The solution is to first find the singular value decomposition of $\mathbf{A}$ as
$\mathbf{UDV}'$, where $\mathbf{U}$ is $n \times r$ and has orthonormal columns,
$\mathbf{V}$ is $m \times r$ and has orthonormal columns, and $\mathbf{D}$ is $r
\times r$, diagonal, with positive values $d_{1} \ge d_{2} \ge \cdots \ge d_{r} > 0$
along the main diagonal.  Then, $\mathbf{B}$ is defined as $\mathbf{U^*D^*V^*}'$,
where we take the first $t$ columns of $\mathbf{U}$ and $\mathbf{V}$ to obtain
$\mathbf{U^*}$ and $\mathbf{V^*}$, respectively, and the first $t$ values, $d_{1} \ge
\cdots \ge d_{t}$, to form a diagonal matrix $\mathbf{D^*}$.

\smallskip

The approximation of $\mathbf{A}$ by a rank $t$ matrix $\mathbf{B}$, has been one
mechanism for representing the row and column objects defining $\mathbf{A}$ in a
low-dimensional space of dimension $t$ through what can be generically labeled as a
biplot (the prefix ``bi'' refers to the representation of both the row and column
objects together in the same space).  Explicitly, the approximation of $\mathbf{A}$
and $\mathbf{B}$ can be written as \[ \mathbf{B} = \mathbf{U^*D^*V^*}' =
\mathbf{U^*D^{*\alpha}D^{(1-\alpha)}V^*}' = \mathbf{PQ}' \ ,\] \noindent where
$\alpha$ is some chosen number between 0 and 1, $\mathbf{P} = \mathbf{U^*D^{*\alpha}}$
and is $n \times t$, $\mathbf{Q} = (\mathbf{D^{(1-\alpha)}V^*}')'$ and is $m \times
t$.

\smallskip

The entries in $\mathbf{P}$ and $\mathbf{Q}$ define coordinates for the row and column
objects in a $t$-dimensional space that, irrespective of the value of $\alpha$ chosen,
have the following characteristic:

\smallskip

If a vector is drawn from the origin through the $i^{th}$ row point and the $m$ column
points are projected onto this vector, the collection of such projections is
proportional to the $i^{th}$ row of the approximating matrix $\mathbf{B}$.  The same
is true for projections of row points onto vectors from the origin through each of the
column points.

\subsection{The Procrustes Problem}


Procrustes (the subduer), son of Poseidon, kept an inn benefiting from what he claimed
to be a wonderful all-fitting bed.  He lopped off excessive limbage from tall guests
and either flattened short guests by hammering or stretched them by racking. The
victim fitted the bed perfectly but, regrettably, died.  To exclude the embarrassment
of an initially exact-fitting guest, variants of the legend allow Procrustes two,
different-sized beds.  Ultimately, in a crackdown on robbers and monsters, the young
Theseus fitted Procrustes to his own bed. (Gower and Dijksterhuis, 2004)

\smallskip

Suppose we have two matrices, $\mathbf{X}_{1}$ and $\mathbf{X}_{2}$, each considered
(for convenience) to be of the same size, $n \times p$.  If you wish, $\mathbf{X}_{1}$
and $\mathbf{X}_{2}$ can be interpreted as two separate $p$-dimensional coordinate
sets for the same set of $n$ objects.  Our task is to match these two configurations
optimally, with the criterion being least-squares: find a transformation matrix,
$\mathbf{T}_{p \times p}$, such that $\parallel \mathbf{X}_{1}\mathbf{T} -
\mathbf{X}_{2} \parallel$ is minimized, where $\parallel \cdot \parallel$ denotes the
sum-of-squares of the incorporated matrix, i.e., if $\mathbf{A} = \{a_{uv}\}$, then
$\parallel \mathbf{A}
\parallel$ = $ \mathrm{trace}(\mathbf{A}'\mathbf{A})$ = $\sum_{u,v} a_{uv}^2$.  For
convenience, assume both $\mathbf{X}_{1}$ and $\mathbf{X}_{2}$ have been normalized so
$\parallel \mathbf{X}_{1} \parallel$ = $\parallel \mathbf{X}_{2} \parallel$ = 1, and
the columns of $\mathbf{X}_{1}$ and $\mathbf{X}_{2}$ have sums of zero.

\smallskip

Two results are central:

\smallskip

(a) When $\mathbf{T}$ is unrestricted, we have the multivariate multiple regression
solution \[ \mathbf{T}^{*} =
(\mathbf{X}_{1}'\mathbf{X}_{1})^{-1}\mathbf{X}_{1}'\mathbf{X}_{2} \ ; \]

(b) When $\mathbf{T}$ is orthogonal, we have the Sch\"{o}nemann solution done for his
thesis in the Quantitative Division at Illinois in 1965 (published in
\emph{Psychometrika} in 1966):

\smallskip

for the SVD of $\mathbf{X}_{2}' \mathbf{X}_{1}$ = $\mathbf{U}\mathbf{S}\mathbf{V}'$,
we let $\mathbf{T}^{*} = \mathbf{V} \mathbf{U}'$.

\subsection{Matrix Rank Reduction}


     Lagrange's Theorem (as inappropriately named by C. R. Rao, because it should really be attributed to Guttman) can be stated as follows:


\smallskip


Let $\mathbf{G}$ be a nonnegative-definite (i.e., a symmetric positive semi-definite) matrix of order  $n \times  n$ and of rank $r >
0$.  Let $\mathbf{B}$ be of order $n \times  s$ and such that $\mathbf{B'GB}$ is non-singular.  Then the residual matrix
\begin{eqnarray}
          \mathbf{G_{1} = G - GB(B'GB)^{-1}B'G}
\end{eqnarray}
\noindent is of rank $r-s$ and is nonnegative definite.


\smallskip


\noindent Intuitively, this theorem allows you to ``take out'' ``factors'' from a covariance (or correlation) matrix.

\smallskip

There are two somewhat more general results (from Guttman) on matrix rank reduction that prove useful:

\smallskip

\noindent Let $\mathbf{S}$ be any matrix of order $n \times N$ and of rank $r > 0$.  Let $\mathbf{X}$ and $\mathbf{Y}$ be of orders $s \times n$ and $s \times N$, respectively (where $s \le r$), and such that $\mathbf{XSY}'$ is nonsingular.  Then the residual matrix \[ \mathbf{S}_{1} = \mathbf{S} - \mathbf{S}\mathbf{Y}'(\mathbf{X}\mathbf{S}\mathbf{Y}')^{-1}\mathbf{X}\mathbf{S} \] is exactly of rank $r-s$.

\smallskip

\noindent If $\mathbf{S}$ is of order $n \times N$ and of rank $r$, $\mathbf{F}$ of order $n \times r$ (and of rank $r$), and $\mathbf{SS}' = \mathbf{FF}'$, then there is a unique matrix $\mathbf{P}$ of order $r \times N$ such that \[ \mathbf{S} = \mathbf{FP} \ .\] The matrix $\mathbf{P} = (\mathbf{F}'\mathbf{F})^{-1}\mathbf{F}'\mathbf{S}$ satisfies $\mathbf{PP}' = \mathbf{I}$ (i.e., $\mathbf{P}$ has orthonormal rows).


\subsection{Torgerson Metric Multidimensional Scaling}


     Let $\mathbf{A}$ be a symmetric matrix of order $n \times n$.  Suppose we want to
find a matrix $\mathbf{B}$ of rank 1 (of order $n \times n$) in such a way that the sum of the squared
discrepancies between the elements of $\mathbf{A}$ and the corresponding
elements of $\mathbf{B}$ (i.e., $\sum_{j=1}^{n} \sum_{i=1}^{n} (a_{ij} - b_{ij})^{2}$) is at a minimum.
  It can be shown that the solution
is $\mathbf{B} = \lambda \mathbf{kk'}$ (so all columns in $\mathbf{B}$ are multiples of $\mathbf{k}$),
 where $\lambda$   is the largest eigenvalue of $\mathbf{A}$\
 and $\mathbf{k}$ is the
corresponding normalized eigenvector.  This theorem can be generalized.  Suppose we take the
first $r$ largest eigenvalues and the corresponding normalized
eigenvectors.  The eigenvectors are collected in an $n \times r$ matrix
 $\mathbf{K} = \{\mathbf{k}_{1},\ldots,\mathbf{k}_{r}\}$
and the eigenvalues in a diagonal matrix $\mathbf{\Lambda}$.  Then $\mathbf{K \Lambda  K'}$  is an $n \times n$
matrix of rank $r$ and is a least-squares solution for the
approximation of $\mathbf{A}$ by a matrix of rank $r$.  It is assumed, here,
that the eigenvalues are all positive.  If $\mathbf{A}$ is of rank $r$ by itself
and we take the $r$ eigenvectors for which the eigenvalues are
different from zero collected in a matrix $\mathbf{K}$ of order $n \times r$, then $\mathbf{A}$ =
 $\mathbf{K \Lambda K'}$.  Note that $\mathbf{A}$ could also be represented by $\mathbf{A} = \mathbf{LL'}$,
  where
$\mathbf{L} = \mathbf{K \Lambda^{1/2}}$ (we factor the matrix), or as a sum of $r$ $n \times n$
 matrices --- $\mathbf{A} = \lambda_{1} \mathbf{k_{1}k_{1}'} + \cdots + \lambda_{r} \mathbf{k_{r}k_{r}'}$.

\smallskip

\emph{Metric Multidimensional Scaling} -- Torgerson's Model (Gower's
Principal Coordinate Analysis)

     Suppose I have a set of $n$ points that can be perfectly
represented spatially in $r$ dimensional space.  The $i^{th}$ point has
coordinates $(x_{i1},x_{i2},\ldots,x_{ir})$.  If
$d_{ij} = \sqrt{\sum_{k=1}^{r} (x_{ik} - x_{jk})^{2}}$ represents the Euclidean
distance between points $i$ and $j$, then
\[ d_{ij}^{*} = \sum_{k=1}^{r} x_{ik}x_{jk}, \mathrm{where} \]
\begin{equation}
 d_{ij}^{*} = - \frac{1}{2} (d_{ij}^{2} - A_{i} - B_{j} + C) ;
\end{equation}

 \[   A_{i} = (1/n)\sum_{j=1}^{n} d_{ij}^{2} ; \]

   \[   B_{j} = (1/n)\sum_{i=1}^{n} d_{ij}^{2} ; \]

  \[   C = (1/n^{2})\sum_{i=1}^{n}\sum_{j=1}^{n} d_{ij}^{2} . \]


Note that $\{d_{ij}^{*}\}_{n \times n} = \mathbf{XX'}$, where $\mathbf{X}$ is of order $n \times r$ and the entry in the
$i^{th}$ row and $k^{th}$ column is $x_{ik}$.

So, the Question:  If I give you $\mathbf{D} = \{d_{ij}\}_{n \times n}$, find me \emph{a} set of coordinates to do it.  The Solution: Find $\mathbf{D}^{*} = \{d_{ij}^{*}\}$, and take its Spectral Decomposition.  This is \emph{exact} here.

\smallskip

     To use this result to obtain a spatial representation for a
set of $n$ objects given \emph{any} ``distance-like'' measure, $p_{ij}$, between
objects $i$ and $j$, we proceed as follows:

     (a)  Assume (i.e., pretend) the Euclidean model holds for $p_{ij}$.

     (b)  Define $p_{ij}^{*}$ from $p_{ij}$ using (2).

     (c)  Obtain a spatial representation for $p_{ij}^{*}$ using a suitable value
     for $r$, the number of dimensions (at most, $r$ can be no larger
     than the number of positive eigenvalues for $\{p_{ij}^{*} \}_{n \times n}$):

                   \[ \{p_{ij}^{*} \} \approx \mathbf{XX'}  \]

     (d)  Plot the $n$ points in $r$ dimensional space.

 \subsection{A Guttman Multidimensional Scaling Result}

 If $\mathbf{B}$ is a symmetric matrix of order $n$, having all its elements non-negative, the following quadratic form defined by the matrix $\mathbf{A}$ must be positive semi-definite:

\[ \sum_{i,j} b_{ij}(x_{i} - x_{j})^{2} = \sum_{i,j} x_{i} a_{ij} x_{j} , \]

where

\[  a_{ij} = \left\{ \begin{array}{cc}

                      \sum_{k=1; k \ne i}^{n} b_{ik} & (i = j) \\

                       -b_{ij}                      & (i \ne j) \\


                       \end{array} \right. \]


\noindent If all elements of $\mathbf{B}$ are positive, then $\mathbf{A}$ is of rank $n - 1$, and has one smallest eigenvalue equal to zero with an associated eigenvector having all constant elements.
Because all (other) eigenvectors must be orthogonal to the constant eigenvector, the entries in these other eigenvectors must sum to zero.

\smallskip

This Guttman result can be used for a method of multidimensional scaling (mds), and is one that seems to get reinvented periodically in the literature. Generally, this method has been used to provide rational starting points in iteratively-defined nonmetric mds.  More recently, the Guttman strategy (although not attributed to him as such) has been applied to graphs and the corresponding 0/1 adjacency matrix (treated as a similarity measure).  In this case, we have what are called Laplacian eigenmaps, where the graphs are imbedded into a space by using the coordinates from the \emph{smallest} nonzero eigenvectors.

\subsection{A Few General MATLAB Routines to Know About}

For Eigenvector/Eigenvalue Decompositions:

\smallskip


$ [\mathbf{V},\mathbf{D}] = \mathrm{eig}(\mathbf{A})$, where $\mathbf{A}$ = $\mathbf{V}\mathbf{D}\mathbf{V}'$, for $\mathbf{A}$ square; $\mathbf{V}$ is orthogonal and contains eigenvectors (as columns); $\mathbf{D}$ is diagonal and contains the eigenvalues.

\smallskip

For Singular Value Decompositions:

\smallskip

$ [\mathbf{U},\mathbf{S},\mathbf{V}] = \mathrm{svd}(\mathbf{B})$, where $\mathbf{B}$ = $\mathbf{U}\mathbf{S}\mathbf{V}'$; the columns of $\mathbf{U}$ and the rows of $\mathbf{V}'$ are orthonormal; $\mathbf{S}$ is diagonal and contains the non-negative singular values (ordered from \emph{largest to smallest}).

\smallskip

The help comments for the Procrustes routine in the Statistics Toolbox are given verbatim below.  Note the very general transformation provided in the form of a MATLAB Structure that involves optimal rotation, translation, and scaling.

\small
\begin{verbatim}

 help procrustes
 procrustes Procrustes Analysis
    D = procrustes(X, Y) determines a linear transformation (translation,
    reflection, orthogonal rotation, and scaling) of the points in the
    matrix Y to best conform them to the points in the matrix X.  The
    "goodness-of-fit" criterion is the sum of squared errors.  procrustes
    returns the minimized value of this dissimilarity measure in D.  D is
    standardized by a measure of the scale of X, given by

       sum(sum((X - repmat(mean(X,1), size(X,1), 1)).^2, 1))

    i.e., the sum of squared elements of a centered version of X.  However,
    if X comprises repetitions of the same point, the sum of squared errors
    is not standardized.

    X and Y are assumed to have the same number of points (rows), and
    procrustes matches the i'th point in Y to the i'th point in X.  Points
    in Y can have smaller dimension (number of columns) than those in X.
    In this case, procrustes adds columns of zeros to Y as necessary.

    [D, Z] = procrustes(X, Y) also returns the transformed Y values.

    [D, Z, TRANSFORM] = procrustes(X, Y) also returns the transformation
    that maps Y to Z.  TRANSFORM is a structure with fields:
       c:  the translation component
       T:  the orthogonal rotation and reflection component
       b:  the scale component
    That is, Z = TRANSFORM.b * Y * TRANSFORM.T + TRANSFORM.c.

    [...] = procrustes(..., 'Scaling',false) computes a procrustes solution
    that does not include a scale component, that is, TRANSFORM.b == 1.
    procrustes(..., 'Scaling',true) computes a procrustes solution that
    does include a scale component, which is the default.

    [...] = procrustes(..., 'Reflection',false) computes a procrustes solution
    that does not include a reflection component, that is, DET(TRANSFORM.T) is
    1.  procrustes(..., 'Reflection','best') computes the best fit procrustes
    solution, which may or may not include a reflection component, 'best' is
    the default.  procrustes(..., 'Reflection',true) forces the solution to
    include a reflection component, that is, DET(TRANSFORM.T) is -1.

    Examples:

       % Create some random points in two dimensions
       n = 10;
       X = normrnd(0, 1, [n 2]);

       % Those same points, rotated, scaled, translated, plus some noise
       S = [0.5 -sqrt(3)/2; sqrt(3)/2 0.5]; % rotate 60 degrees
       Y = normrnd(0.5*X*S + 2, 0.05, n, 2);

       % Conform Y to X, plot original X and Y, and transformed Y
       [d, Z, tr] = procrustes(X,Y);
       plot(X(:,1),X(:,2),'rx', Y(:,1),Y(:,2),'b.', Z(:,1),Z(:,2),'bx');

\end{verbatim}


\end{document}