\documentclass[12pt]{report}

\setlength{\textwidth}{6.5in}
\setlength{\textheight}{8.5in}
\setlength{\topmargin}{0.0in}
\setlength{\oddsidemargin}{0.0in}


\usepackage{graphics}
\usepackage{curves}
\usepackage{tikz}

\begin{document}

Version: \today

\Huge

\begin{center}

\textbf{Notes on Canonical Correlation}

\end{center}


\Large



\bigskip

Suppose we have a collection of random variables in a $(q+p) \times 1$ vector  $\mathbf{X}$ that we partition in the following form (and supposing without loss of generality that $p \le q$): \[ \mathbf{X} = \left( \begin{array}{c}
X_{1} \\
\vdots \\
X_{p} \\
---  \\
X_{p+1} \\
\vdots \\
X_{p+q} \\
\end{array} \right) = \left( \begin{array}{c}
\mathbf{X}_{1} \\
--- \\
\mathbf{X}_{2} \\
\end{array} \right) \sim \mathrm{MVN}(\mbox{\boldmath $\mu$}, \mbox{\boldmath $\Sigma$}) \ , \] where \[ \mbox{\boldmath $\mu$} = \left( \begin{array}{c}
\mbox{\boldmath $\mu$}_{1} \\
\mbox{\boldmath $\mu$}_{2} \end{array} \right) \ ; \  \mbox{\boldmath $\Sigma$} = \left( \begin{array}{cc}
 \mbox{\boldmath $\Sigma$}_{11} & \mbox{\boldmath $\Sigma$}_{12} \\
  \mbox{\boldmath $\Sigma$}_{21} & \mbox{\boldmath $\Sigma$}_{22} \end{array} \right) \ , \] and remembering that  $\mbox{\boldmath $\Sigma$}_{21} =  \mbox{\boldmath $\Sigma$}_{12}'$, and \[ \mathrm{Cor}(\mathbf{a}'\mathbf{X}_{1}, \mathbf{b}'\mathbf{X}_{2}) = \mathbf{a}'  \mbox{\boldmath $\Sigma$}_{12} \mathbf{b} / \sqrt{ \mathbf{a}'  \mbox{\boldmath $\Sigma$}_{11} \mathbf{a}}\sqrt{ \mathbf{b}'  \mbox{\boldmath $\Sigma$}_{22} \mathbf{b}} \ . \]

  \bigskip

  Suppose \[  \mbox{\boldmath $\Sigma$}_{11}^{-1} \mbox{\boldmath $\Sigma$}_{12} \mbox{\boldmath $\Sigma$}_{22}^{-1} \mbox{\boldmath $\Sigma$}_{12}' \mathbf{a} = \lambda \mathbf{a} \ , \] with roots $\lambda_{1} \ge \lambda_{2} \ge \cdots \ge \lambda_{p} \ge 0$, and corresponding eigenvectors $\mathbf{a}_{1}, \ldots, \mathbf{a}_{p}$. Also, let \[ \mbox{\boldmath $\Sigma$}_{22}^{-1} \mbox{\boldmath $\Sigma$}_{12}' \mbox{\boldmath $\Sigma$}_{11}^{-1} \mbox{\boldmath $\Sigma$}_{12} \mathbf{b} = \lambda \mathbf{b} \ , \] with roots $\lambda_{1} \ge \lambda_{2} \ge \cdots \ge \lambda_{p} \ge 0 \ \mathrm{and} \  \lambda_{p+1} = \lambda_{q} = 0$; the corresponding eigenvectors are $\mathbf{b}_{1}, \ldots, \mathbf{b}_{p}$.

  \bigskip

  Looking at the two linear combinations, $\mathbf{a}_{i}'\mathbf{X}_{1}$ (called the $i^{th}$ canonical variate in the first set), and $\mathbf{b}_{i}'\mathbf{X}_{2}$ (called the $i^{th}$ canonical variate in the second set), the squared correlation between them is $\lambda_{i}$;  the $i^{th}$ canonical correlation is $\sqrt{\lambda}_{i}$.  The maximum correlation between any two linear combinations is $\sqrt{\lambda}_{1}$, and is obtained for $\mathbf{a}_{1}$ and  $\mathbf{b}_{1}$.  For  $\mathbf{a}_{i}$ and  $\mathbf{b}_{i}$, these are uncorrelated with every canonical variate up to that point, and maximize the correlation subject to that restriction.

  \bigskip

  Points to make:

  \bigskip

  a)  The matrices $\mbox{\boldmath $\Sigma$}_{11}^{-1} \mbox{\boldmath $\Sigma$}_{12} \mbox{\boldmath $\Sigma$}_{22}^{-1} \mbox{\boldmath $\Sigma$}_{12}'$ and  $\mbox{\boldmath $\Sigma$}_{22}^{-1} \mbox{\boldmath $\Sigma$}_{12}' \mbox{\boldmath $\Sigma$}_{11}^{-1} \mbox{\boldmath $\Sigma$}_{12}$ are not symmetric and so the standard eigenvector/eigenvalue decompositions are not straightforward.  However, the two matrices \[\mbox{\boldmath $\Sigma$}_{11}^{-1/2} \mbox{\boldmath $\Sigma$}_{12} \mbox{\boldmath $\Sigma$}_{22}^{-1} \mbox{\boldmath $\Sigma$}_{12}'  \mbox{\boldmath $\Sigma$}_{11}^{-1/2}\] and \[\mbox{\boldmath $\Sigma$}_{22}^{-1/2} \mbox{\boldmath $\Sigma$}_{12}' \mbox{\boldmath $\Sigma$}_{11}^{-1} \mbox{\boldmath $\Sigma$}_{12} \mbox{\boldmath $\Sigma$}_{22}^{-1/2}\] are symmetric.  Also, \[\mbox{\boldmath $\Sigma$}_{11}^{-1/2} \mbox{\boldmath $\Sigma$}_{12} \mbox{\boldmath $\Sigma$}_{22}^{-1} \mbox{\boldmath $\Sigma$}_{12}'  \mbox{\boldmath $\Sigma$}_{11}^{-1/2} \mathbf{e}_{i} = \lambda_{i} \mathbf{e}_{i} \ , \] and \[ \mbox{\boldmath $\Sigma$}_{22}^{-1/2} \mbox{\boldmath $\Sigma$}_{12}' \mbox{\boldmath $\Sigma$}_{11}^{-1} \mbox{\boldmath $\Sigma$}_{12} \mbox{\boldmath $\Sigma$}_{22}^{-1/2} \mathbf{f}_{i} = \lambda_{i} \mathbf{f}_{i} \ , \] where the roots, i.e., the $\lambda_{i}$s, are the same as before.  We can then obtain $\mathbf{a}_{i} = \mbox{\boldmath $\Sigma$}_{11}^{-1/2} \mathbf{e}_{i}$, and $\mathbf{b}_{i} = \mbox{\boldmath $\Sigma$}_{22}^{-1/2} \mathbf{f}_{i}$.  Both $\mbox{\boldmath $\Sigma$}_{11}^{-1/2}$ and $\mbox{\boldmath $\Sigma$}_{22}^{-1/2}$ are constructed from the spectral decompositions of $\mbox{\boldmath $\Sigma$}_{11} = \mathbf{P} \mathbf{D} \mathbf{P}'$ and  $\mbox{\boldmath $\Sigma$}_{22} = \mathbf{Q} \mathbf{F} \mathbf{Q}'$ as $\mbox{\boldmath $\Sigma$}_{11}^{-1/2} = \mathbf{P} \mathbf{D}^{-1/2} \mathbf{P}'$ and $\mbox{\boldmath $\Sigma$}_{22}^{-1/2} = \mathbf{Q} \mathbf{F}^{-1/2} \mathbf{Q}'$.  Note the normalizations of $\mathrm{Var}(\mathbf{a}_{i}'\mathbf{X}_{1}) = \mathbf{a}_{i}' \mbox{\boldmath $\Sigma$}_{11} \mathbf{a}_{i}' = \mathbf{e}_{i}' \mbox{\boldmath $\Sigma$}_{11}^{-1/2}\mbox{\boldmath $\Sigma$}_{11} \mbox{\boldmath $\Sigma$}_{11}^{-1/2} \mathbf{e}_{i} = 1$ and $\mathrm{Var}(\mathbf{b}_{i}' \mathbf{X}_{2}) = 1$.
  
  \bigskip
  
  b) There are three different normalizations that are commonly used for $\mathbf{a}_{i}$ and $\mathbf{b}_{i}$:
  
  \bigskip
  
  (i)  leave as unit length so $\mathbf{a}_{i}'\mathbf{a}_{i} = \mathbf{b}_{i}'\mathbf{b}_{i} = 1$;
  
  \bigskip
  
  (ii) make the largest value 1.0 in both $\mathbf{a}_{i}$  and $\mathbf{b}_{i}$;
  
  \bigskip
  
  (iii) do as we did above and make $\mathbf{a}_{i}' \mbox{\boldmath $\Sigma$}_{11} \mathbf{a}_{i}' = 1 = \mathbf{b}_{i}' \mbox{\boldmath $\Sigma$}_{22} \mathbf{b}_{i}'$.
  
  \bigskip
  
  (c) Special cases:  When $p = 1$ and $q = 1$, $\lambda_{1}$ is the (simple) squared correlation between two variables; when $p = 1$ and $q > 1$, $\lambda_{1}$ is a  squared multiple correlation.  In considering $\mathbf{a}_{i}'\mathbf{X}_{1}$ versus $\mathbf{X}_{2}$, $\lambda_{i}$ is the squared multiple correlation of   $\mathbf{a}_{i}'\mathbf{X}_{1}$ with $\mathbf{X}_{2}$; $\mathbf{b}_{i}$ gives the regression weights.
  
  \bigskip
  
  (d)  When moving to the sample, all items have direct analogues.  The one restriction on sample size is $n \ge p + q + 1$.
  
  \bigskip
  
  (e)  Suppose the variables $\mathbf{X}_{1}$ and $\mathbf{X}_{2}$ are transformed by nonsingular matrices, $\mathbf{A}_{p \times p}$ and $\mathbf{B}_{q \times q}$, as follows: \[ \mathbf{Y}_{1} = \mathbf{A}_{p \times p} \mathbf{X}_{1} + \mathbf{c}_{p \times 1} \] \[ \mathbf{Y}_{2} = \mathbf{B}_{q \times q} \mathbf{X}_{2} + \mathbf{d}_{q \times 1} \] The same canonical variates and correlations using $\mathbf{Y}_{1}$ and $\mathbf{Y}_{2}$ would be generated as from $\mathbf{X}_{1}$ and $\mathbf{X}_{2}$; the weights in $\mathbf{a}_{i}$ and $\mathbf{b}_{i}$ would be on the transformed variables, obviously.  In particular, we could work with standardized variables without loss of any generality, and just use the correlation matrix.
  
  \bigskip
  
  (f)  To evaluate $H_{0}: \mbox{\boldmath $\Sigma$}_{12} = \mathbf{0}$, a likelihood ratio test is available: \[  
  -(n-1 - (1/2)(p + q + 1)) \ \mathrm{ln}   \prod_{i=1}^{p} (1 - \lambda_{i}) \sim \chi_{pq}^{2} \ . \]  Also, sometimes a sequential process is used to test the remaining roots until nonsignificance is reached:  \[
  -(n-1 - (1/2)(p + q + 1)) \ \mathrm{ln}   \prod_{i=k+1}^{p} (1 - \lambda_{i}) \sim \chi_{(p-k)(q-k)}^{2} \ . \]  This latter sequential procedure is a little problematic because there is no real control over the overall significance level with this strategy.
  
\bigskip

Generally, there is some tortuous difficulty in interpreting the canonical weights substantively.  I might suggest using a constrained least-squares approach (iteratively moving from one set to a second), where the weights are forced to be nonnegative.
  
  


\end{document}