som_read_data
Purpose
Reads data from an ascii file in SOM_PAK format.
Syntax
sD = som_read_data(filename)
sD = som_read_data(..., dim)
sD = som_read_data(..., 'missing')
sD = som_read_data(..., dim, 'missing')
Description
This function is offered for compatibility with SOM_PAK, a SOM software
package in C. It reads data from a file in SOM_PAK format.
The SOM_PAK data file format is as follows. The first line must
contain the input space dimension and nothing else. The following
lines are comment lines, empty lines or data lines. Unlike programs
in SOM_PAK, this function can also determine the input dimension
from the first data lines, if the input space dimension line is
missing. Note that the SOM_PAK format is not fully supported: data
vector 'weight' and 'fixed' properties are ignored (they are treated
as labels).
Each data line contains one data vector and its labels. From the beginning
of the line, first are values of the vector components separated by
whitespaces, then labels also separated by whitespaces. If there are
missing values in the vector, the missing value marker needs to be
specified as the last input argument ('NaN' by default). The missing
values are stored as NaNs in the data struct.
Comment lines start with '#'. Comment lines as well as empty lines are
ignored, except if the comment lines that start with '#n' or '#l'. In that
case the line should contain names of the vector components or label names
separated by whitespaces.
NOTE: The minimum value Matlab is able to deal with (realmax)
should not appear in the input file. This is because function sscanf is
not able to read NaNs: the NaNs are in the read phase converted to value
realmax.
Required input arguments
filename (string) input filename
Optional input arguments
dim (scalar) input space dimension
missing (string) string used to denote missing components (NaNs);
default is 'NaN'
Output arguments
sD (struct) the resulting data struct
Examples
The basic usage is:
sD = som_read_data('system.data');
If you know the input space dimension beforehand, and the file does
not contain it on the first line, it helps if you specify it as the
second argument:
sD = som_read_data('system.data',9);
If the missing components in the data are marked with some other
characters than with 'NaN', you can specify it with the last argument:
sD = som_read_data('system.data',9,'*')
sD = som_read_data('system.data','NaN')
Here's an example data file:
5
#n one two three four five
#l ID
10 2 3 4 5 1stline label
0.4 0.3 0.2 0.5 0.1 2ndline label1 label2
# comment line: missing components are indicated by 'x':s
1 x 1 x 1 3rdline missing_components
x 1 2 2 2
x x x x x 5thline emptyline
See also