Exploratory Data Analysis (EDA) Toolbox


Contents

Version 1.0, November 2004

The EDA Toolbox comes with absolutely no warranty and is distributed under the GNU license. For details see the license.txt file. This is free software, and you are welcome to redistribute it under certain conditions (see the license.txt file and the GNU license)

It is intended that this toolbox accompany the book Exploratory Data Analysis with MATLAB, Wendy and Angel Martinez, CRC Press, 2004. However, you do not need the book to use this toolbox.

We have several upgrades in mind (GUIs, functions), so please keep checking the download websites for new versions.

Acknowledgments

Disclaimer

Platform

Installation

Contents by category

Contents in alphabetical order

Data Sets

Documentation

GTM Toolbox

SOM Toolbox


Acknowledgments:

We would like to acknowledge code that has been contributed by other authors (under the GNU license):

Other references are available on the above websites and on the individual function help pages. We are also grateful for helpful suggestions regarding MATLAB coding from Tom Lane, The MathWorks, Inc.

Return to top.


Disclaimer:

This software and documentation are distributed in the hope that they will be useful, but they are distributed without any warranty and without even the implied warranty of correctness or fitness for a particular purpose.

The federal government, in particular the Department of the Navy and the Department of Defense, disclaims all responsibility for this software and any outcome from its use. In addition, this software and documentation does not reflect the views of and is not endorsed by the federal government nor the Department of the Navy.

The code has been tested with care, but is not guaranteed to be free of defects and is not guaranteed for any particular purpose. Bug reports and suggestions for improvements are always welcome.

Return to top.


Platform:

This code has been developed for the Windows system, using MATLAB 6.5 and 7.0. The M-file functions should work on any platform, but the installation steps would be different. Consult your system files or The MathWorks website for more information.

Note that for full functionality, the user should have the MATLAB Statistics Toolbox, Version 4 or higher.

Return to top.


Installation:

The following installation instructions are for Windows versions of MATLAB. This toolbox can also be used with Unix and Linux operating systems, with suitable changes to the directory structure for toolboxes and paths.

  1. First download the required files and save it in a temporary directory.
  2. Make a new directory under your current MATLAB toolbox installation. In most cases, this would be: C:\MATLAB7\toolbox\edatool
  3. Double click on the.zip file and extract files to the above directory. Note that you could also create this new directory in the unzipping process.
  4. The MATLAB search path must be updated for you to use the toolbox files from any directory. The search path is kept in the pathdef.m file. By default, it is stored in the following directory: C:\MATLAB7\toolbox\local
  5. Before starting MATLAB, open the file pathdef.m using any text editor. One way to do this is to double-click on the file from Windows Explorer. This will open the file in the MATLAB text editor. Add the new directory matlabroot, '\toolbox\edatool;',... to the path.
  6. Close and save the file. Start MATLAB. Type helpwin at the command line to bring up the Help Browser. Click on edatool for a list of the available functions in the EDA Toolbox.

Alternative way to set the path:

a. Start MATLAB.
b. Start the Set Path dialog box from the File menu in the MATLAB Command Window.
c. Add the new directory for the EDA Toolbox to the path. Hit the Save button to permanently save your changes to the pathdef.m file.
d. Close MATLAB and restart it to see the changes in the Help files.

Changing the MATLAB environment for the EDA Toolbox:

  1. To provide access to the toolbox via the START button, see the instructions on The MathWorks online documentation.
  2. To add the Help files in the Help Browser, see the instructions on The MathWorks online documentation.

Return to top.


Contents by Category

CLUSTERING
	adjrand	        	Adjusted Rand index to compare groupings.
	agmclust	      	Model-based agglomerative clustering.
	genmix	        	GUI to generate random variables from finite mixture.
	mbcfinmix	     	Model-based finite mixture estimation - EM.
	mbclust	        	Model-based clustering.
	mixclass	        Classification using mixture model.
	mojenaplot        'Mojena Rule' plot for estimating the number of clusters
	plotbic	        	Plot the BIC values from model-based clustering.
	randind	        	Rand index to compare groupings.
	reclus	        	ReClus plot to visualize cluster output.
	rectplot	     	Rectangle plot to visualize hierarchical clustering.
	treemap	        	Treemap display for hierarchical clustering.

DATA TOURS
	csppstrtrem	    	Remove structure in PPEDA.
	intour	        	Interpolation tour of the data.
	kdimtour	      	k-dimensional grand tour.
	permtourandrews  	Permutation tour using Andrews’ curves.
	permtourparallel 	Permutation tour using parallel coordinate plots.
	ppeda	           	Projection pursuit EDA.
	pseudotour	    	Pseudo grand tour.
	torustour	      	Asimov grand tour

DIMENSIONALITY REDUCTION
	gtm_pmd	        	Calculates posterior mode projection (GTM Toolbox).
	gtm_pmn	        	Calculates posterior mean projection (GTM Toolbox).
	gtm_stp2	       	Generates components of a GTM (GTM Toolbox).
	gtm_trn	        	Train the GTM using EM (GTM Toolbox).
	hlle	          	Hessian eigenmaps.
	idpettis	       	Intrinsic dimensionality estimate.
	isomap	        	ISOMAP nonlinear dimensionality reduction.
	lle	            	Locally linear embedding.
	nmmds	           	Nonmetric multidimensional scaling.
	som_autolabel	  	Automatic labeling (SOM Toolbox).
	som_data_struct	Create a data structure (SOM Toolbox).
	som_make	       	Create, initialize and train SOM (SOM Toolbox).
	som_normalize	   	Normalize data (SOM Toolbox).
	som_set	        	Set up SOM structures (SOM Toolbox).
	som_show	       	Basic SOM visualization (SOM Toolbox).
	som_show_add	    Shows hits, labels and trajectories (SOM Toolbox).

DISTRIBUTION SHAPES
	bagmat.exe	    	Executable file to get arrays needed for bagplot.
	bagplot	        	M-file to construct actual bagplot.
	boxp	         	Boxplot - regular.
	boxprct	        	Box-percentile plot.
	polarloess	    	Bivariate smoothing using loess.
	quantileseda	  	Sample quantiles.
	quartiles	       	Sample quartiles using Tukey’s fourths.
GUIs
	brushscatter      	Scatterplot brushing and linking.
	genmix		    	Executable file to get arrays needed for bagplot.
	isomapeda      	Explore results of ISOMAP.
	scattergui	    	Scatterplot with interactive labeling.

SMOOTHING
	loess	         	1-D loess scatterplot smoothing.
	loess2 	        	2-D loess smoothing from Data Visualization Toolbox.
	loessenv	     	Loess upper and lower envelopes.
	loessr	       	 	Robust loess scatterplot smoothing.
	polarloess	    	Bivariate smoothing using loess.

VISUALIZATION
	brushscatter	    Scatterplot brushing and linking.
	coplot	        	Coplot from Data Visualization Toolbox.
	csandrews	       	Andrews' curves plot.
	csparallel	    	Parallel coordinates plot.
	dotchart	       	Dot chart plot.
	hexplot	        	Hexagonal binning for scatterplot.
	mojenaplot        'Mojena Rule' plot for estimating the number of clusters
	multiwayplot	    Multiway dot charts.
	plotbic	        	Plot the BIC values from model-based clustering.
	plotmatrixandr		Plot matrix of Andrews' curves.
	plotmatrixpara		Plot matrix of parallel coordinates.
	reclus	        	ReClus plot to visualize cluster output.
	rectplot	      	Rectangle plot to visualize hierarchical clustering.
	scattergui	    	Scatterplot with interactive labeling.
	treemap	        	Treemap display for hierarchical clustering.
Return to top.


Data Sets

	abrasion        Rubber experiments
	animal          Brain weights and body weights of animals
	bank            Genuine and forged money
	calibrat        Radioactivity counts to hormone level
	cereal          Assessment of eight brands of cereal
	environmental   Environmental variables: ozone, solar radiation, temperature, wind speed
	ethanol         Compression ratio, equivalence ratio, NO_x for a single-cylinder engine
	forearm         Length in inches of the forearm of adult males
	example*        Various data sets for the example in the boook
	galaxy          Velocities of the spiral galaxy
	geyser          Waiting times in minutes between eruptions of the Old Faithful geyser
	hamster         Organ weights for hamsters with congenital heart failure
	iradbpm         Interpoint distance matrix (IRad) for the BPM data set
	iris            Fisher's iris data. Species are in different arrays.
	L1bpm           Interpoint distance matrix (L1) for the BPM data set
	leukemia        Gene expression level of patients with leukemia
	livestock       Livestock counts from a 1987 census of farm animals
	lsiex           Term-document matrix used in Example 2.3, illustrating LSI
	lungA           Gene expression data for lung cancer
	lungB           Gene expression data for lung cancer    
	matchbpm        Interpoint distance matrix (match coefficient) for the BPM data set
	ochiaibpm       Interpoint distance matrix (Ochiai similarity measure) for the BPM data set
	oronsay         Particle size measurements for sand
	ozone           Daily maximum ozone concentrations at ground level on 132 days in 1974
	playfair        Populations of 22 cities and diameters used for display by William Playfair
	pollen          Artificial data set created for 1986 Joint Statistical Meetings
	posse           Data sets from Christian Posse, used for projection pursuit EDA
	salmon          Size of spawning stock of salmon along the Skeena River
	scurve          Data from an S-curve manifold
	singer          Height in inches of singers
	skulls          Measurements of 40 skulls. 18 are female and 22 are male
	software        Data on software inspections
	spam            Attributes of email, some of which are spam
	sparrow         Measurements on sparrows. The first 21 survived, and the rest died.
	swissroll       Data from the Swiss roll manifold
	votfraud        Democratic and Republican pluralities of voting machines and absentee votes
	yeast           Gene expression data for yeast cells - two cell cycles and five phases
Return to top.


Contents in Alphabetical Order

adjrand	        	Adjusted Rand index to compare groupings.
agmclust	    	Model-based agglomerative clustering.
bagmat.exe	    	Executable file to get arrays needed for bagplot.
bagplot	        	M-file to construct actual bagplot.
boxp	        	Boxplot - regular.
boxprct	        	Box-percentile plot.
brushscatter		Scatterplot brushing and linking.
coplot	        	Coplot from Data Visualization Toolbox.
csandrews	    	Andrews' curves plot.
csparallel	    	Parallel coordinates plot.
csppstrtrem	    	Remove structure in PPEDA.
dotchart	    	Dot chart plot.
genmix	        	GUI to generate random variables from finite mixture.
gtm_pmd	        	Calculates posterior mode projection (GTM Toolbox).
gtm_pmn	        	Calculates posterior mean projection (GTM Toolbox).
gtm_stp2	    	Generates components of a GTM (GTM Toolbox).
gtm_trn	        	Train the GTM using EM (GTM Toolbox).
hexplot	        	Hexagonal binning for scatterplot.
hlle	        	Hessian eigenmaps.
idpettis	    	Intrinsic dimensionality estimate.
intour	        	Interpolation tour of the data.
isomap	        	ISOMAP nonlinear dimensionality reduction.
isomapeda         	Explore results of ISOMAP.
kdimtour	       	k-dimensional grand tour.
lle	            	Locally linear embedding.
loess	           1-D loess scatterplot smoothing.
loess2 	        	2-D loess smoothing from Data Visualization Toolbox.
loessenv	     	Loess upper and lower envelopes.
loessr	        	Robust loess scatterplot smoothing.
mbcfinmix	       	Model-based finite mixture estimation - EM.
mbclust	        	Model-based clustering.
mixclass	        Classification using mixture model.
mojenaplot        'Mojena Rule' plot for estimating the number of clusters
multiwayplot	    Multiway dot charts.
nmmds	          	Nonmetric multidimensional scaling.
permtourandrews  	Permutation tour using Andrews’ curves.
permtourparallel 	Permutation tour using parallel coordinate plots.
plotbic	        	Plot the BIC values from model-based clustering.
plotmatrixandr		Plot matrix of Andrews’ curves.
plotmatrixpara		Plot matrix of parallel coordinates.
polarloess	    	Bivariate smoothing using loess.
ppeda	        	Projection pursuit EDA.
pseudotour	    	Pseudo grand tour.
quantileseda		Sample quantiles.
quartiles	     	Sample quartiles using Tukey’s fourths.
randind	        	Rand index to compare groupings.
reclus	        	ReClus plot to visualize cluster output.
rectplot	    	Rectangle plot to visualize hierarchical clustering.
scattergui	    	Scatterplot with interactive labeling.
som_autolabel		Automatic labeling (SOM Toolbox).
som_data_struct	Create a data structure (SOM Toolbox).
som_make	    	Create, initialize and train SOM (SOM Toolbox).
som_normalize		Normalize data (SOM Toolbox).
som_set				Set up SOM structures (SOM Toolbox).
som_show	    	Basic SOM visualization (SOM Toolbox).
som_show_add		Shows hits, labels and trajectories (SOM Toolbox).
torustour	    	Asimov grand tour
treemap	        	Treemap display for hierarchical clustering.
Return to top.