\documentclass[xcolor = pdftex, dvipsnames,table] {beamer}

\usetheme{Berkeley}

\title[Introduction to R]{The Briefest of Introductions to R}

\author{Lawrence Hubert}

\institute{Department of Psychology \\The University of Illinois}

\date{A Programming Language and Software Environment for Statistical Computing and Graphics}

\begin{document}

\begin{frame}

\titlepage

\end{frame}








\begin{frame}

\frametitle{The R Programming Language}

Originally developed by Ross Ihaka and Robert Gentleman in the late 1990's at the University of Auckland, New Zealand.  It is now developed by the \emph{R Development Core Team}

\bigskip

R is an implementation of the S programming language developed at Bell Laboratories by John Chambers and colleagues.  Another implementation of S is in the commercially available product, S-Plus.

\bigskip

Extra Credit Question: What other languages and operating systems were developed at Bell Labs?

\bigskip

The name `R' comes from (partly) the first name of the two original authors, and as a letter-play on the name `S'.


\end{frame}

\begin{frame}

\frametitle{}

The main place to get everything you need about R is:

\bigskip

 http://www.r-project.org

\bigskip

and one of the CRAN mirrors (Comprehensive R Archive Network):

\bigskip

 http://cran.r-project.org

\bigskip

R is widely used for statistical software development and data analysis; it could now be considered the \emph{lingua franca} for statisticians, and is analogous to how MATLAB is viewed for the Engineering and Computer Science community.

\end{frame}

\begin{frame}

\frametitle{}

R is ``state of the art''; (almost all) statistical researchers provide their ``cutting edge'' methods as R packages.

\bigskip

The source code is freely available under the GNU General Public License; also, precompiled binary versions are available (and free) for Windows, Mac OS X, and Linux.

\bigskip

Generally, R uses a command line interface, but several GUIs are available.  We will introduce one done by John Fox, called R Commander, towards the end of our session.

\bigskip

If you need a GUI to do any of the analyses discussed, say, in 406/7, then R Commander is for you.

\end{frame}

\begin{frame}

\frametitle{}

R supports a large variety of statistical and numerical techniques in its (eight) base packages (e.g., the base stats  package); in fact, most standard methods (through all of standard multivariate analysis) are already available in this default installation.

\bigskip

R is also highly extensible through the use of packages --- user-submitted libraries for specific functions or specific areas of study.  We will list some later that are (most) relevant to psychology.

\bigskip

Because of its S language lineage, R has stronger object-oriented programming facilities than most statistical computing languages.

\end{frame}

\begin{frame}

\frametitle{}

The term ``environment'' is intended to characterize R as a fully planned and coherent system, rather than as an incremental accretion of very specific and inflexible tools; this is frequently the case with other data analysis software (e.g., SAS, SPSS, SYSTAT, ...)

\bigskip

R (like S) is designed around a true computer language; it allows users to add additional functionality by defining new functions.  Much of the system is itself written in R, making it easy for users to follow the algorithmic choices made.

\bigskip

For computationally-intensive tasks, C, C++, and Fortran code can be linked and called at run time.  Also, advanced users can write C code to manipulate R objects directly.

\end{frame}

\begin{frame}

\frametitle{}

R is more than a statistics system.  It is an environment within which statistical techniques are implemented.

\bigskip

R has its own Latex(-like) documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.

\bigskip

Generally, a statistical package such as SPSS is oriented toward combining instructions and rectangular data sets to produce (voluminous) printout and graphs. Routine, standard data analysis is easy; innovative or nonstandard analysis is hard or impossible.

\end{frame}

\begin{frame}

\frametitle{}

A programming environment is oriented toward transforming one data structure into another.  Programming environments, such a R (or S), are extensible; standard data analysis is easy, but so are innovation and nonstandard analysis.

\bigskip

One of R's strengths is its graphical capabilities, which produce publication-quality graphs that include mathematical symbols.

\bigskip

Although R is mostly used by statisticians and other practitioners requiring a complete environment for statistical computation and software development, it could also be used as a general matrix manipulation and calculation Toolbox.

 \end{frame}

 \begin{frame}

 \frametitle{}

  Such a calculator usage is much like MATLAB (and the free OCTAVE).  This will not be our emphasis here, however.  Generally, I prefer MATLAB for this one-off calculator-type task that deals explicitly with matrices.

  \bigskip

  Using R to do some matrix manipulations is fairly straightforward, and comparable to how MATLAB does things.  You may need some of this facility, however, if you ever wish to write your own functions, and, ultimately, your own packages.
  
  \bigskip
  
  As documentation, see Bill Revelle's matrix\_algebra\_in\_r.pdf at the class web site.

\end{frame}


\begin{frame}

\frametitle{Packages}

The capabilities of R are extended by user-contributed \emph{packages} (comparable to Toolboxes in MATLAB), allowing specialized statistical techniques, graphical devices, as well as programming interfaces and import/export capabilities to many external data formats.

\bigskip

A core set of packages are included with the installation of R, with over 1000 more available at a CRAN site.

\bigskip

Notable packages are listed along with comments on the official R Task View pages:

\bigskip

 http://cran.r-project.org/web/views

\end{frame}

\begin{frame}

\frametitle{}

Task views of particular interest are: Cluster; Environmetrics; Multivariate; Psychometrics; Social Sciences

\bigskip

The Bioinformatics community has started a successful effort to use R for the analysis of data from molecular biology laboratories.

\bigskip

The Bioconductor Project, started in 2001, provides R packages for the analysis of genomic data, such as Affymetrix and cDNA microarray object-oriented data handling and analysis tools.

\bigskip

 http://www.bioconductor.org

\end{frame}

\begin{frame}

\frametitle{}

Jonathan Baron (U Penn, Psychology) has a very nice R help page.  It gives a complete list of all packages, plus a search facility to look for what you might need:

\bigskip

 http://finzi.psych.upenn.edu

\bigskip

Also, remember the R project page and the Cran mirrors:

\bigskip

 http://www.r-project.org

\bigskip

 http://www.cran.r-project.org

\end{frame}

\begin{frame}

\frametitle{}


The \emph{Journal of Statistical Software} is an on-line resource founded in 1996 by the American Statistical Association as a freely available and peer reviewed resource for statistical software and algorithms.

\bigskip

The site listed below for \emph{JSS}, is devoted to an open-source philosophy.  Thus, for both articles and code snippets, the source code is published along with the paper:

\bigskip

 http://www.jstatsoft.org

\bigskip

Although code implementations can use different languages or computing environments, the emphasis is mainly on R, and, to a lesser extent, MATLAB.


\end{frame}

\begin{frame}

\frametitle{}

Several other good sources of R related material:

\bigskip

\emph{R News} was (now, \emph{The R Journal}) the newsletter of the R project, and features short to medium length articles covering topics that might be of interest to users or developers of R.  It is all free, and on-line in pdfs:

\bigskip

 http://journal.r-project.org

\bigskip

Bill Revelle (Northwestern, Psychology) maintains the Personality Project, and a very good introduction to R for psychological research:

\bigskip

 http://www.personality-project.org/r/r.guide.html

\end{frame}

\begin{frame}

\frametitle{Some Contributed Packages of Particular Interest in Psychology}

See Rnews (December, 2007) on the Psychometrics Task View; and  meta: An R Package for Meta-Analysis.

\bigskip

See the R package  Zelig  (Everyone's Statistical Software) ---

\bigskip

Extra Credit Question: Who was Zelig and what actor played him?

\end{frame}

\begin{frame}

\frametitle{}

Multilevel Modeling:  multilevel  and  nlme:

\bigskip

 http://cran.r-project.org/doc/contrib/Bliese\_Multilevel.pdf

\bigskip

Neural Networks:  nnet  and  AMORE  (A MORE flexible neural network package)

\bigskip

 coin: A computational framework for conditional inference (i.e., think exact tests; permutation tests; randomization tests)

\bigskip

 tsp: Infrastructure for the Traveling Salesperson Problem

\bigskip

 tree: Classification and regression trees

\end{frame}

\begin{frame}{}

 seriation  : Row/column object ordering in proximity matrices

\bigskip

 sem: Structural equation models

\bigskip

 rimage: Image processing module for R

\bigskip

 mlica: Maximum likelihood implementation of Independent Components Analysis

\bigskip

 lsa: Latent semantic analysis (using document-term matrices)

\bigskip

 labdsv: Ordination (i.e., scaling) and multivariate analysis for ecology

\end{frame}

\begin{frame}

\frametitle{}

 gap: Genetic analysis package

\bigskip

 fmri: Analysis of fMRI experiments

\bigskip

 AnalyzeFMRI: Functions for analysis of fMRI data sets stored in the ANALYZE or NIFTI format

\bigskip


 ecodist: Dissimilarity-based functions for ecological analysis (contains  nmds  for non-metric multidimensional scaling)

\bigskip

 eba: Elimination-by-aspects models

\bigskip

 lp: Interface to Lp\_solve for linear/integer programs

\end{frame}

\begin{frame}

\frametitle{}

 Rglpk: Linear and mixed integer programming solver using GLPK (GNU Linear Programming Kit)

\bigskip

 clValid: An R package for cluster validation (also, see the CRAN Task View on Cluster Analysis and Finite Mixture Models)

\bigskip

 clue: cluster ensembles (contains a lot of my work on fitting ultrametrics and additive trees by least-squares iterative projection)

\bigskip

 qgen: Quantitative genetics using R

\bigskip

 psyphy: Functions for analyzing psychophysical functions

\end{frame}

\begin{frame}

\frametitle{}

smacof: multidimensional scaling (from Jan de Leeuw)

\bigskip



 psych: Procedures for personality and psychology research (from Bill Revelle)

\bigskip

 ade4: Analysis of ecological data: Exploratory and Euclidean methods in environmental sciences

\bigskip

 anacor: Simple and canonical correspondence analysis

\bigskip

 amap: Another multidimensional analysis package

\bigskip

 acepack:  ace  and  avas  for selecting regression transformations

\end{frame}

\begin{frame}

\frametitle{}

 e1071: Packages from Vienna, including  svm  (support vector machines) and  lca  (latent class analysis)

\bigskip

 cba: Clustering for business analytics (includes  order.optimal  for ordering the leaves of a tree --- one of my research areas as well)

\bigskip

 MASS: Package for the text \emph{Modern Applied Statistics with S} (contains  isoMDS  to carry out Kruskal's non-metric multidimensional scaling)




\end{frame}

\begin{frame}

\frametitle{Packages for Social Network Analysis}

 sna: Tools for social network analysis

\bigskip

 network: Classes for relational data

\bigskip

 latentnetHRT  and  latentnet  : Latent position and cluster models for networks

\bigskip

 inetwork: Network analysis and plotting

\bigskip

 statnet: A suite of packages for network analysis:

\bigskip

 http://csde.washington.edu/statnet

\bigskip

 ergm: Exponential-family models for networks

\end{frame}

\begin{frame}

\frametitle{}

 degreenet: Models for skewed count distributions relevant to networks

\bigskip

Also, see  STOCNET  at

\bigskip

 http://stat.gamma.rug.nl/snijders/


\bigskip

This is an open software system for the advanced statistical analysis of social networks (primarily, the work of Tom Snijders).

\end{frame}

\begin{frame}

\frametitle{Gnumeric}

The Gnumeric spreadsheet is part of the GNOME (Linux) desktop environment.  GNOME is a project to create a free, user friendly desktop environment for Linux.

\bigskip

The goal of Gnumeric is to be the best possible spreadsheet.  It will import existing Microsoft Excel files, and it won't screw-up simple statistical computations (as Excel routinely does; Microsoft seems incapable of coming up with a quality product --- this is continually documented in the statistical literature).

\end{frame}

\begin{frame}

\frametitle{}

There is an open-source build for Windows, as well as the various Linux versions.  See:

\bigskip

 http://www.gnome.org/projects/gnumeric

\bigskip

Gnumeric can even do linear and mixed integer programming.

\bigskip

It is an explicit friend of R.



\end{frame}

\begin{frame}

\frametitle{Using R}

One strong characteristic of R is the great help system.  Try:

\bigskip

 help(t.test)

\bigskip




 help.search("cluster")

\bigskip

Also, the various manuals (involving a huge number of pages) are available in pdfs when you install R. The various contributed packages also come with various kinds of help documentation.


\end{frame}

\begin{frame}

\frametitle{}

Before we can use R or any of the contributed programs, we need to get our data (usually, a subject by variable rectangular matrix) into what is called a \emph{data frame}.

\bigskip

Here are a number of ways of doing this:

\bigskip

1)  Keyboard input

\bigskip

2) Reading data into a data frame from a textfile

\bigskip

3) Using the spreadsheet-like data editor in R

\bigskip

4) Importing data from some spreadsheet format (e.g., *.xls), or from SPSS (e.g., *.sav)

\bigskip

5) Accessing data that are already in R libraries

\end{frame}

\begin{frame}

\frametitle{}
We have our community data in a textfile called  communitydata.txt , with verbatim contents as follows:

\smallskip




     accidents  vehicles  police


a       1          4        20


b       4         10         6


c       5         15         2


d       4         12         8


e       3          8         9


f       4         16         8


g       2          5        12


h       1          7        15


i       4          9        10


j       2         10        10



\end{frame}

\begin{frame}

\frametitle{Some Commands to Try}



accidents = c(1,4,5,4,3,4,2,1,4,2)

\bigskip

comm.labels = c('a','b','c','d','e','f','g',
'h','i','j')

\bigskip

vehicles = scan()

\bigskip

4 10 15 12 8 16 5 7 9 10

\bigskip

police = scan()

\bigskip

20 6 2 8 9 8 12 15 10 10

\bigskip

community = data.frame(comm.labels,accidents,vehicles,police)





\end{frame}

\begin{frame}

\frametitle{}


community

\bigskip


community.altone = edit(as.data.frame(NULL))

\bigskip

community.altone

\bigskip

fix(community)




\end{frame}

\begin{frame}

\frametitle{}


community.alttwo = read.table('G:/r\_class\_material/communitydata.txt', header = TRUE)

\bigskip

community.alttwo


\end{frame}

\begin{frame}

\frametitle{}


attach(community)

\bigskip

summary(community)

\bigskip

community.model = lm(accidents $\sim$ vehicles  + police)

\bigskip

summary(community.model)

\bigskip

plot(community.model)

\bigskip


\end{frame}

\begin{frame}

\frametitle{Accessing and Using Data in Libraries}

Assuming that a package has been ``installed'':

\bigskip


install.packages('car')

\bigskip

library(car)

\bigskip

data(Prestige)

\bigskip

attach(Prestige)

\bigskip

objects()

\bigskip

search()




\end{frame}

\begin{frame}

\frametitle{}



dim(Prestige)

\bigskip

mean(income)

\bigskip

summary(type)

\bigskip

summary(education)


\end{frame}

\begin{frame}

\frametitle{}



plot(income,prestige)

\bigskip

abline(lm(prestige $\sim$ income))

\bigskip

title('Scatterplot of Prestige
 vs. Income')

 \bigskip

identify(income,prestige,row.names(Prestige))



\end{frame}

\begin{frame}

\frametitle{User Defined Functions}


Functions are the core of R.  They take inputs and produce outputs.

\bigskip



hello = function(x) \textbraceleft

\bigskip

cat("hello",x,"$\backslash$n")

\bigskip

\textbraceright

\bigskip

hello("and goodbye")




\end{frame}

\begin{frame}

\frametitle{}

absolute = function(x) \textbraceleft

\bigskip

       if (x $<$ 0) \textbraceleft

       \bigskip

       return (-x) \textbraceright

       \bigskip

       else \textbraceleft

       \bigskip

       return(x)

       \bigskip

       \textbraceright

       \bigskip

      \textbraceright

       \bigskip

absolute(-5)


\end{frame}

\begin{frame}

\frametitle{}

factorial = function(x) \textbraceleft

\bigskip

ret = 1 

\bigskip


for(i in 1:x) \textbraceleft


\bigskip


ret = ret*i

\bigskip

\textbraceright

\bigskip

return(ret) 

\bigskip



\textbraceright

\bigskip

factorial(5)

\end{frame}

\begin{frame}

\frametitle{}

factorial.alt = function(x) \textbraceleft

\bigskip

i = 1

\bigskip

f = 1

\bigskip

while (i $< =$ x) \textbraceleft

\bigskip

      f = f*i

      \bigskip

      i = i  + 1

      \bigskip

      \textbraceright

      \bigskip

f  \textbraceright

\bigskip



factorial.alt(5)

\end{frame}

\begin{frame}

\frametitle{}

ourhistogram = function(x,breaks =
 "Scott", col = "purple",...) \textbraceleft

 \bigskip

                hist(x,breaks = breaks,
 probability = TRUE, col = col, ...)

 \bigskip

               \textbraceright

                \bigskip

x = rnorm(200)

\bigskip

ourhistogram(x,xlab =
 "histogram of x", col = "green")

 \bigskip

\end{frame}

\begin{frame}

\frametitle{R Commander}

R Commander is a Basic Statistics GUI for R, written by John Fox from McMasters (Sociology)

\bigskip


install.packages("Rcmdr",dependencies = True)

\bigskip

library(Rcmdr)

\end{frame}

\begin{frame}

\frametitle{RWinEdt}

RWinEdt is a package that provides a plug-in for using WinEdt as an editor for R.  This gives syntax highlighting, indenting, and so forth.

\bigskip

Remember the shareware license information on WinEdit that we have in our Latex presentation.  The Psychology Department has a site license.

\bigskip

Also, you might check out Tinn-R.

\end{frame}


\end{document} 