Вы находитесь на странице: 1из 23

Analysis of some extensions of the Self-Organized Maps: Evolving SOMs (ESOMs) (1), Growing-Hierarchy SOMs (GHSOM) (2) Relative-Density

SOMs (ReDSOM) (3)


Domenico Leuzzi

Abstract. We describe and analyze some interesting extensions of the Self-Organized Map (SOM) algorithm as proposed originally by Kohonen, namely the Evolving SOM (ESOM) and the Growing-Hierarchy SOM (GHSOM), as well as a visualization method to identifying the changes of the cluster structures in temporal datasets, called ReDSOM. The ESOM algorithm differentiate itself from its parent SOM because the topology of the map it creates is not fixed as in the SOM but it is adaptively built on the basis of the dataset distribution, thus reducing the number of the map units required to achieve a determined quantization error. The GHSOM algorithm tries to reproduce the hierarchy of the dataset creating a multilevel map in which each units of a map level can be explained more deeply by a map on the next level. The multilevel approach not only optimizes the use of the units (a level map is added only if it is necessary to improve its quantization error) but allows a more quickly navigation of the maps obtained. The ReDSOM visualization method is useful when we have a dataset which evolves in time and we want to compare the clustering structure on the map space of the dataset snapshots taken a two different time instants. This methods allows to identify visually by means a different coloring, emerging/lost clusters, cluster enlargement/shrinking, more/less dense clusters, clusters movement.

Introduction

We start in section 2 with a brief presentation of the SOM algorithm as originally developed by Kohonen (3; 4). In section 3 we explain the used the experimental tests. Section 4 is dedicated to the ReDSOM visualization method, sections 5 and 6 to the GHSOM and ESOM algorithms respectively.

Self-Organizing Maps

A SOM (also known as SOFM (Self-Organized Feature Map), or Kohonen Maps) is an artificial neural networks based on unsupervised competitive learning (3; 4). A low-dimensional grid of neurons (aka units), usually 2-D, is built following a fixed and predetermined topology (i.e. rectangular or hexagonal). This grid constitutes the so called map space (or output space). Whichever the used topology is, each unit is connected with a number of neighboring units which are equidistant in the map space: in the rectangular topology each unit is surrounded by four equidistant neighboring units, and in the hexagonal topology it is surrounded by six units1. The grids units are initialized in the data space, that is each unit weight vector (aka the codebook vector, prototype vector or reference vector) is given an initial value taken from the input space. The initialization can be random or o linear. In the latter case the initial values are chosen in an orderly fashion along the first principal components, being the map space dimensionality. The map is then trained using a competitive unsupervised learning algorithm. At each training step , a randomly selected data point is chosen from the dataset. Then the best matching unit (BMU) corresponding to , i.e. the unit with the weight vector closest to , is selected from map, in accordance with (1) ,
(1)

After that, not only the BMU but also all its neighbors are adjusted in according to the adaption rule where = , , , = 1, ,
(2)

, ,

= ( )

( )

is the neighborhood kernel function dependent on the distance between the winning neuron and the neighbor unit , as well as on the time . The parameter controls the size of zone of the neurons around the winning one that are affected by the update, while ( ) is a decreasing function of the time (i.e. linearly or exponentially) controlling the strength of the adaption.

The border units are surrounded by fewer units, unless the lattice is wrapped in a cylindrical or toroidal structure.

Tools used for the experimental tests

For implementing the ReDSOM visualization methods we used the SOM Toolbox 2.0 2 which is a powerful package developed for MATLAB that allows managing every aspect of the SOMs, from the initialization, to the training, ending up with the visualization both in the map space and in the data space or the projected space. Detailed information can obtained from the documentation and from source code. For the implementation of the GHSOM algorithm it has been used the package Java SOMToolbox3. It contains several modules to train and visualize an SOM. In particular we used the modules GHSOM to grows a hierarchy of maps, and SOMViewer to visualize the results. We used also the ESOM Toolbox for MATLAB to do some tests on the ESOM algorithm.

ReDSOM (3)

Suppose we have a temporal dataset ( ) that is a dataset whose distribution varies with time. We want to compare the clustering distribution of the dataset at two different time instants and , that is of the two datasets , ( ). The clustering itself and comparing is much simpler if carried on in the low dimensional space of two SOMs of equal topology, , ( ) , trained on such two datasets, , ( ). In order to be able to compare directly such two maps, they must have the same orientation and the datasets on which they are trained have to be normalized using the same normalization method and parameters. So the procedure to obtain two maps useful to be compared by the method here concerned is as follows: 1. Normalize both datasets , ( ) using the same normalization method (e.g. the common z-score) and the same parameters. 2. Initialize map ( ) using ordered values. 3. Train map ( ) using the dataset ( ). 4. Initialize map ( ) using the codebook vectors of the previous trained map . 5. Train map ( ) using the dataset ( ). The maps so obtained are directly comparable with each other. That is, if we define a density function on the data space, related to the density of the units in the data space (that is considering their prototype vectors), we can compare one-to-one the density of units of the two maps (the two map units, in the same position of the two maps, are compared together).
2 3

The package is available at the URL http://www.cis.hut.fi/projects/somtoolbox/download/ The package is available at the URL http://www.aut.ac.nz/__data/assets/file/0015/10176/ecos_esom.zip

4.1

Area Density and Relative Density Definitions

We define area density ( ) of the map ( ) and calculated on the data space vector , as the sum of the values of a Gausssian kernel functions centered on the vector and calculated on the prototype vectors of the map units, as shown in (3) =
,,

( )

exp (

( )
(3)

The radius defines the width of the kernel function, and its value should be chosen in accordance with the mean distance of the neighboring units. It was observed that the quartile (e.g. the third quartile) is a balanced choice. Now we define the relative density ( ) ( ) ( ) as the ratio between the area density of the map ( ) to the area density of the map ( ) both calculated at the same location of vector , as shown in (4)

= log

( ) ( )

(4)

The use of the logarithm in (4) allows the values of the densities ratio, to be converted to negative values when the ratio is below 1 (increase of density), and positive when the ratio is above 1 (decrease of density). The base-two logarithm allows getting a more suitable scale. For example a value of +2 indicates a density four times more dense on the map ( ), while a value of 2 indicates a density four times less dense on the map ( ). Based on experimental observation, values of ( ) less than 3 indicate that the location of vector is no longer occupied on the next map ( ) (it is lost) while values greater than +3 indicate that the location of vector was not occupied on the previous map ( ) but it is on the next map ( ) (it is new). The relative density calculation is to be performed only on the prototype vectors of two maps , ( ) and not on the actual data vectors. So the running time of calculation of a map is quadratic with number of units and not with number of data points ( ) , where ( ). 4.2 Relative Density Visualization

As said on the previous section the relative density calculation is to be performed only on the prototype vectors , ( ) of the two maps , ( ). Let us set

and

as shorthand to indicate the values of the relative density calculated respectively, on the prototype vectors of first map and on the prototype vectors of second map.

We visualize and on the respective maps, in a gradation of blue for positive values and in a gradation of red for negative values, as shown in Fig. 2 The visualization should be used to detect a density decrease of the vectors of the first map (negative values of relative density). In fact if we want to detect if a vector of the first map has been lost or has decreased its density in the second map, we have to choose a location vector for the calculation of the relative density where that vector is surely present, that is of reference vectors of the first map. Similarly the visualization should be used to detect the vectors that in the second map have increased their density with respect to the same vector location on the first map.

4.3

MATLAB Implementation using SOM Toolbox

To implement the algorithm of the Relative Density Visualization in MATLAB using the SOM Toolbox, first of all it is necessary to initialize and train the two maps relative to the two snapshots of the dataset we want to compare, following the procedure indicated is section 4 to obtain two directly comparable maps. The MATLAB code we used to do that is as follows: sD{1}=som_normalize(sD{1}, 'var'); % normalize sD2 using the 'var' method (z-score) sD{2}=som_normalize(sD{2}, sD{1}) % normalize sD2 using the same method and parameters as sD1 sM{1}=som_lininit(sD{1}); % initialize linearly map 1 sM{1}.comp_names{1}='x'; sM{1}.comp_names{2}='y'; % Train map 1 (use batch training) % (the trained map will be put in sM1_t) sTr1=som_train_struct(sM1,sD{1},'algorithm', 'batch', 'phase','rough'); sM_t{1}=som_batchtrain(sM{1}, sD{1}, sTr1); sTr1=som_train_struct(sM1,sD1,'algorithm', 'batch', 'phase','finetune'); sM_t{1}=som_batchtrain(sM_t{1}, sD{1}, sTr1); sM{2}=sM{1}; % map 2 same topoloy and codebook vectors as map 1 % train map 2 (use batch training) sTr2=som_train_struct(sM{2},sD{2},'algorithm', 'batch', 'phase','rough');

sM_t{2}=som_batchtrain(sM{2}, sD{2}, sTr2); sTr2=som_train_struct(sM{2},sD{2},'algorithm', 'batch', 'phase','finetune'); sM_t{2}=som_batchtrain(sM_t{2}, sD{2}, sTr2); See the SOM Toolbox documentation to understand the meaning of each function and each structure. The variables sD, sM, sM_t are cell arrays containing respectively the two datasets, the two untrained maps and the two trained maps. We preferred to keep the untrained and the trained maps in separate variable. We used the batch training algorithm because it speeds up the training The MATLAB code we used to calculate the relative densities lows: , is as fol-

[c1, p1, err1, ind1]=kmeans_clusters(sM1_t); [density1, radius]=som_density(sM1_t, sM1_t.codebook, 'kp', p1{dataset1_knum}); [density2]=som_density(sM2_t, sM1_t.codebook, 'radius', radius); rd{1}=log2(density2./density1); [density1]=som_density(sM1_t, sM2_t.codebook, 'radius', radius); [density2]=som_density(sM2_t, sM2_t.codebook, 'radius', radius); rd{2}=log2(density2./density1); The first line calculates the clustering of the codebook vectors using the function kmeans_clusters. The returned variable p1 is a cell array which contains in the position the clustering information for a number of clusters = . The partitioning of the prototype vectors is needed for the calculation of the radius parameter present in the Gaussian function used in the expression of the area density function. The radius is calculated in the next code line (first som_density invocation) for the first area density calculation. The next three calculations of area densities utilize the radius calculated by the first invocation and dont require the clustering information parameters before obtained because they dont need to calculate the radius. The function som_density is not part of the SOM Toolbox package, the salient part of this function is the calculation of the radius and is reported in the following code fragment: U = som_umat(M, sTopol, mode, 'mask', mask); [mean_neighbors_dist_cluster]=neighbors_dist(U, topol.msize, sTopol.lattice, kp, knum); mean_neighbors_dist = mean(mean_neighbors_dist_cluster); r = quart * mean_neighbors_dist;

The first line calculates the U-distance matrix and the second line calculates the mean distances between neighbors in each prototype vectors cluster; the parameter kp is a vector containing the clustering information of each prototype vectors and knum is the number of clusters.

4.4

Results on synthetic datasets

First synthetic example The Fig. 1 indicates the datasets used to shows how the relative density visualization method performs when there are lost clusters, new clusters, and changes in cluster density. The two datasets are constituted by a superposition of four sets of normally-distributed (Gaussian) 2-D data points. The variance of each normally distributed set was chosen to the common value 0.2, while the mean value was accordingly chosen so as the resulting sets do not (practically) overlap. In the figure are indicated both the datasets and the denormalized codebook vectors of the trained maps. Comparing the top portion of the figure with the bottom one we can see that going from the first toward the second, the cluster A is lost, it appears the new cluster E and that the two clusters B, D change density, the first becomes denser and the second one less dense.

Fig. 1. Datasets used to show the capacity of the Relative Density Visualization to detect lost clusters, new clusters as well as clusters with a density variation. Besides showing the datasets points (blue points), the figure shows the denormalized codebook vectors of the maps trained on them (red crosses). Both datasets are constituted by four sets of normally-distributed (Gaussian) 2D points. Each of these normally-distributed set was chosen with a common value of variance (0.2), while the mean value was set so as the four groups are practically non-overlapping. Going from the first dataset (a) to the second one (b) we can see that there is a lost cluster (A), a new cluster (E), a denser cluster (B) and a less dense cluster (D); the cluster C remains unchanged.

The Fig. 2 shows the two trained maps using both the usual visualizations (component planes and U-distance matrix) as well as the relative density visualization. The top portion of the figure shows the visualizations relative the first map, whilst the bottom portion is relative to the second map. The visualization shows clearly that there is a region of strong red coloring, which is associated with a very low value of relative density ( 3 ). This region corresponds to the cluster A which is lost going toward the second map. There are two regions one colored with light blue and the other colored with light red. The light blue indicates an increase of density (cluster B), while the light red indicated a decrease of density (cluster D). At last there is a neutral zone (white color, cluster C) which corresponds to an unchanged cluster. As it has mentioned before, is able to show the changes of the clusters present in the first map (A,B,C,D), but it cannot show the changes relative to the clusters that are only present in the second map, that is it cannot detect the creation of new clusters . (like the cluster E). That kind of information is instead obtainable by The visualization of in bottom part of the figure shows clearly a zone with strong relative density value ( +3 ) corresponding to the dark blue coloring. This region is associated with the emerging cluster E. The other two regions (light blue, light red, white colors) are the same identified by the visualization .

Fig. 2. Visualizations of the maps trained with the dataset (a) shown in Fig. 1a and (b) shown in Fig. 1b. In the figure are drawn the two component planes, the U-distance matrix and the relative distance calculated on the units (that is on their codebook vectors) of the considered map. On the figure are outlined the contours of the clusters as obtained from the U-distance matrix (blue color indicates high distance in the data space while the yellow color indicates a low distance and so the units with a blue color represent a region of separation between clusters). The relative distance on the first map units ( ) indicates that the cluster A is lost (strong red coloring, that is is strongly negative), the cluster B has increased its density (light blue color, weakly positive), the cluster C is unchanged (white color, zero) and the cluster D has decreased its density (light red color, weakly negative). The relative density calculated on the second map units ( ) makes in evidence the presence of the emerging cluster E (dark blue coloring , high positive value);.it also gives about same information about the clusters B,C,D as that showed by . It worth noting, however, the is more suitable to show a decreasing of density (the light red coloring of the cluster D is more uniform on than on ), while is more suitable to show an increase of density (the light blue coloring of the cluster B is more uniform on the than on )

Second synthetic example Fig. 3 indicates the datasets used to shows how the relative density visualization method is able to make in evidence a cluster centroid shift. The two datasets are similar to the ones used in the Fig. 1. The difference is that in the second dataset, the cluster A is not lost anymore but it shifts its centroid, and that the emerging cluster E is no longer present. In the figure are indicated both the datasets and the denormalized codebook vectors of the trained maps. The denser and less dense cluster B and D are the same of those of the datasets of Fig. 1.

Fig. 3. Datasets used to show the capacity of the Relative Density Visualization a shift of cluster centroid. Besides showing the datasets points (blue points), the figure shows the denormalized codebook vectors of the maps trained on them (red crosses). Both datasets are constituted by four sets of normally-distributed (Gaussian) 2D points. Each of these normally-distributed set was chosen with a common value of variance (0.2), while the mean value was set so as the four groups are practically non-overlapping. Going from the first dataset (a) to the second one (b) we can see that there is shifted cluster (A), a denser cluster (B) and a less dense cluster (D); the cluster C remains unchanged.

The Fig. 4 shows the two trained maps using both the usual visualizations (component planes and U-distance matrix) as well as the relative density visualization. The top portion of the figure shows the visualizations relative the first map, whilst the bottom portion is relative to the second map. We can draw the same conclusions about the clusters B,C,D as we did for the previous example shown in Fig. 2: in the visualizations and , the cluster the B has increased its density (light blue coloring), cluster D has decreased its density (light red coloring) and cluster C is about unchanged (there is a very light red coloring but its negligible). See the previous example for more details. What is worth our attention is the region of the cluster A: there is a light red coloring on and a light blue coloring on , both not covering the entire cluster: when the border of the colored region crosses inner part of a cluster it means there is a clusand no coloring on ), a cluster shrinking ter enlargement (blue coloring or (red coloring on and no coloring on ), or a shift of cluster centroid (both red coloring on and blue coloring on ). Our case corresponds to the third configuration (both partial coloring): indeed the cluster A is subject has a cluster centroid shift.

Fig. 4. Visualizations of the maps trained with the dataset (a) shown in Fig. 3a and (b) shown in Fig. 3b. In the figure are drawn the two component planes, the U-distance matrix and the relative distance calculated on the units (that is on their codebook vectors) of the considered map. On the figure are outlined the contours of the clusters as obtained from the U-distance matrix (blue color indicates high distance in the data space while the yellow color indicates a low distance and so the units with a blue color represent a region of separation between clusters). The relative distances , give the same indication about the clusters B,C,D as the previous example shown in Fig. 2: see the previous example more details. What is worth our attention is the region of the cluster A: there is a light red coloring on and a light blue coloring on , both not covering the entire cluster: when the border of the colored region crosses inner part of a cluster it means there is an enlargement (blue color) and/or

GHSOM (1)

The topology preservation capability of the Self-Organized Maps allows creating a low-dimensional representation of a dataset, i.e. of a collect of documents, so as to organize it and make it easy to search the desired information. As the amount of information to be represented grows, the map needed to organize it becomes larger. A large map, even if low-dimensional, utilized to represent the whole dataset makes hard to find a particular data of interest. In the unique map representation method, although the reduction of dimensionality simplifies the visualization of the data, it makes lose the hierarchical structure of the data itself. The Growing-Hierarchical Self-Organized Map, it is conceived with the idea of distribute the dataset to be represented in several distinct sub-maps, each specialized on a specific portion of the data space, and linked together by a hierarchical relationship. In addition each sub-map can grow in size to fit the detail of representation needed. This multilevel approach not only performs the dimensionality reduction without losing the topology, of a dataset, proper of the ordinary SOM, but it makes it also possible to maintain to some degree its hierarchical structure.

5.1

The algorithm

The key idea is to use multiple layers of distinct SOMs. The first layer contains only a SOM. Each units of this map can be expanded into a finer SOM in the next (lower level) layer. The same applies to the units of the maps of this new layer, and so on the algorithm goes ahead until a predetermined level of detail is reached (see Fig. 5). In addition for every map added to the structure we utilize a incrementally growing version of the SOM; we start from a simple 2x2 map and eventually grow it if after its training, the mapping quality is not satisfying. We start at the layer zero, from a very rough representation of the data, just a single map unit whose weight vector is set at the mean point of all the dataset vectors; indeed this first unit has only the purpose of calculating the initial quantization error associated with the data. In general the quantization of a unit is calculated as the sum of all the distances between the weight vector of the unit and the data vectors mapped onto this unit; in particular represents how far in total are the dataset vectors from their mean vector location. We proceed with the first true SOM at layer one, starting from a small 2x2 map configuration, which is trained with the standard SOM algorithm. For each SOM, the training process is repeated for fixed number of iterations. When the training process of a SOM is done, its mean quantization error is calculated. The mean quantization error of a map is a mapping quality index defined as the average value of the quantization errors of all the units of that SOM. If the of the map just added and trained is higher than a predefined fraction of the of the unit in the preceding layer the map it is linked to, a new row or a new column of units is added to the SOM. The point of addition is set between the map unit with the highest (called error unit) and the most dissimilar (in term of its

weight vector) neighbor unit. The weights of the units added are initialized as the average of their neighbors, and the training procedure is repeated as said above. When the grow process is concluded, we can say that the new SOM added presents the preceding layer unit from which it is expanded, but at higher detail4. The units of an added SOM with have a quantization error too high, higher than a predefined threshold fraction of the initial quantization error at layer zero, , are expanded into an SOM in the next lower level layer. The parameter controls the granularity of data representation in each final unit the hierarchy (not expanded into a further map). The more this parameter is low, the more are the units which require expansion, so more is deepness of the hierarchy produced. Summing up the structure can grow both in breadth and in depth. The shape of the hierarchy is controlled by the two parameters and . The size of each single map tends to increase, as the parameter is lower, while the deepness of the hierarchy, that is its expansion level, increase as the parameter is lower.

Fig. 5. Hierarchical structure of a GHSOM.

Actually the first layer SOM is the first detail of representation of the dataset because the preceding layer SOM is just a dummy map.

5.2

MATLAB Implementation of the GHSOM: GHSOM Toolbox

We used the package Java SOMToolbox5 to implement the GHSOM algorithm. It contains several modules to train and visualize an SOM. In particular we used the modules GHSOM to grows an hierarchy of maps, and SOMViewer to visualize the results. 5.3 Experimental results

We used a dataset consisting of 101 animals described in a data space with a dimensionality of 20. The components of such space are simply Boolean values corresponding to the following attributes: Hair, feathers, eggs, milk, airborne, aquatic, predator, toothed, backbone, breathes, venomous, fins, 2_legs, 4_legs, 5_legs, 6_legs, 8_legs, tail, domestic, catsize. We ran two tests to prove how the hierarchy structure can shaped by means of the two parameters and .

Test 1: three layers hierarchy Training the GHSOM with the parameters = 0.070 and = 0.0035, has produced a structure hierarchy constituted by three layers, as depicted in Fig. 6. The first layer map has been expanded to form a 2 4 grid. Each unit of this layer is further expanded into a map in the second layer. Some of the units in the second layer maps are even further expanded in third later. As can be seen from the figure, the algorithm is able to organize the data in a meaningful way. For example the Aquatics, Mammalians, Birds, etc. are all organized into a separate sub-map in the second layer. In addition the sub-map representing related species are close together, like the second layer sub-map representing the mammalians quadrupeds, and that representing the mammalians which are not quadrupeds. In Fig. 7 are the component planes of the first layer map units.

The package is available at the URL http://www.ifs.tuwien.ac.at/dm/somtoolbox/

31 data items #2 Mink Platypus Layer 3

#16 #12 Aardvark Bear Boar Cheetah Leopard Lion Lynx Mongoose Polecat Puma Raccoon Wolf Layer 3 #6 Antelope Buffalo Deer Elephant Giraffe Oryx #5 Pussycat Calf Goat Pony Reindeer Layer 3 Seasnake Bass Catfish Chub Herring Piranha Stingray dogfish pike tuna Layer 3 Haddock Seahorse Sole Carp Layer 3

#9 Vulture Kiwi Crow Hawk Layer 2 Rhea Penguin Gull Skimmer Skua

11 data items Swan Duck Layer 3 Lark Pheasant Sparrow Wren Layer 2 Ostrich Layer 3 Chicken Dove Parakeet Flamingo

Dolphin Porpoise

Layer 2

#4 Hare Vole Mole Opossum Layer 3 #2 Cavy Hamster Layer 3

Aquatic

Birds (those ib blue are also aquatic)

Bird

Layer 2

Quadrupeds Mammalians
8 data items Squireel Fruitbat Vampire Layer 2 #9 Gorilla Wallaby Girl Sealion Seal Layer 3 Layer 3 Layer 2 Toad Frog Venomous_frog Tortoise Newt Tuatara Layer 3 Scorpion Slowworm Pitviper Layer 3 #7 Starfish Crayfish Lobster Layer 2 Clam Seawasp Crab Octopus #10 Slug Worm Gnat Wasp Honeybee Layer 3 Layer 2 Flea Termite Ladybird Housefly Moth

Mammalians

Excluding those indicated in red, all the species here are aquatic

Aquatic

Excluding those indicated in red, all the species here have six legs

Layer 1
Fig. 6. Hierarchy procuced with the parameters = 0.070, = 0.0035. The first layer map has grown to a 2 4 configuration, and has been reached a depth of 3 layers. In the table are indicated some of the categories grouped by the GHSOM

Fig. 7. Component planes of the first layer map of the GHSOM trained in test 1

Test 2: two layers hiararchy Training the GHSOM with the parameters 0.025 and 0.0035, has produced a structure hierarchy constituted by two layers, as depicted in Fig. 8. The first layer map has been expanded up to a 4 5 grid. Each unit of this layer is further expanded into a map in the second layer. In Fig. 9 are the component planes of the first layer map units.

Kiwi Crow Hawk Layer 2

Gull Skimmer Skua Chicken Dove Parakeet Layer 2 Tortoise

Lark Pheasant Sparrow Wren Duck

Flea Termite Slug Worm Layer 2

Ladybird Gnat

Wasp Housefly Moth Layer 2

Honeybee

Fruitbat Vampire Squirrel Layer 2

Wallaby Gorilla

Flamingo Swan Layer 2

Vulture Ostrich

Penguin Rhea

Octopus Clam Layer 2

Seawasp Starfish

Crayfish Lobster Crab

Scorpion Mole Opossum Layer 2 Antelope Buffalo Deer Elephant Giraffe Oryx Cavy

Hare Vole

Seasnake Pitviper Layer 2

Slowworm Tuatara

Newt Frog Layer 2

Venomous_frog Toad

Platypus

Girl

Bass

Carp

Catfish Chub Herring Piranha Haddock Seahorse Sole

Stingray Dogfish Pike Tuna Layer 2

Seal Sealion Layer 2

Dolphin Porpoise

Aardvark Bear

Layer 2 Pussycat Layer 2

Boar Cheetah Leopard Lion Lynx Mongoose Polecat Puma Raccoon Wolf Mink

Calf Goat Pony Reindeer Layer 2

Layer 1

Fig. 8. Hierarchy procuced with the parameters reached a depth of 2 layers.

= 0.025,

= 0.0035. The first layer map has grown to a 4 5 configuration, and has been

Fig. 9. Component planes of the first layer map of the GHSOM trained in test 2

ESOM (5)

In the context of data clustering and vector quantization, one of major challenge is ability of dealing with an online data stream characterized by an unknown or timedependent statistic. The most simple approach is the k-means in its online version, where for each incoming input data vector , only the prototype vector closest to is updated by dragging it nearer itself (Winner-Takes-All scheme). This approach is known as local k-means algorithm (6) and it.While this method is quite straightforward, it can suffer from confinement to local minima. The SOM algorithm (3; 4) is able to overcome this problem because it uses a soft approach in which not only the winner of the competition is updated but also its neighbors, depending on their proximity to the input vector. In addition it has the well-known topology-preserving ability which puts the prototype vector as to mirror the statistic of the data. The SOM algorithm uses a fixed predetermined topology of the units in the low-dimensional map space (aka feature space; usually 2-D or 3-D) which defines their order and their neighborhood relationship. When the original manifold is too complicated to be fol-

lowed by fixed-topology low dimensional map space this brings to a highly folded feature map. The topology constraint on the feature map is removed in the neural-gas model (7), the dynamic cell structure (DCS-GCS) (8) and the growing neural gas (GNG) (9). In all these methods the map structure is built dynamically to fit incoming data, but the need to calculate local resources for prototypes, which increase the computational effort to calculate and thus brings to a reduction of efficiency. The ESOM model is similar the GNG but it does not require local resource calculation and the node insertion mechanism is more efficient than that the DCS and GNG.

6.1

The ESOM algorithm

We start with an empty map, adding new nodes as the input vectors arrive. We will use the following symbols: = { , , , } indicates the of the prototype nodes at the -th step; N is current number of nodes; is the dimension input manifold; , = is the set of the of all the unordered pairs ( , ) of nodes , which are connected together. The algorithm can be schematized as follows:

1. A new input is presented to the network 2. Let us consider the set ( ) = of prototypes nodes that match the input vector within a predefined threshold ( ) is empty go to step 4 (node insertion) otherwise to step 5 3. If (nodes updating) 4. Node insertion. Create a new node that matches exactly and increment by one the input vector , insert it into
= = +1

Connect the new node with the two nearest neighbors , (if they exist, that is if has at least two elements) and connect also them; if has only one element connect only the new node with it
= ,

, ,

Go to step 6.

if if

has at least two elements has only one element

5. Nodes updating. Update each node of rule ,


according to the 1, ,

6. Go back to step 1 until no data is available

6.2

Experimental results

Using ESOM toolbox we can see the result of training applied on the MackeyGlass data. The dataset is constituted by 200 5D points taken from the temporal series , 6 , 12 , 18 , 6 . The output of toolbox is showed in Fig. 10. From the figure we can see about all the nodes are created during the first 100 samples. Obviously decreasing the Error Threshold below the value show in the figure (0.1) results an increasing of the node evolved.

Fig. 10. The evolved ESOM structure from the Mackey-Glass time series data for the following parameters values: error threshold 0.8, learning rate 0.2 and sigma 0.4. Is has been used a dataset constituted by 200 samples of 5-dimesional vectors taken from temporal series , 6 , 12 , 18 , 6 . The evolved map is plotted in the two-dimensional space constituted by the two firs principal components.

Bibliography
1. On-line Pattern Analysis by Evolving Self-Organizing Maps. Kasabov, D. Deng and N. 2. Business, Culture, Politics, and Sports - How to Find Your Way Through a Bulk of News? On Content-Nased Hierarchical Structuring and Organization of Large Document Archives. Micheal Dittenbach, Andreas Rauber, Dieter Merkl. 3. Denny, Graham J. Williams, Peter Christen. ReDSOM: Relative Density Visualization of Temporal Changes. 4. Kohonen, T. Self-Organized formation of topologically correct feature maps. Biological. Cybernetics, 43. 1982, pp. 59-69. 5. The Self-organized map. Kohonen. s.l. : IEEE, 1990. Prodeeding of the IEEE, VOL. 78 , N.9, September 1990. pp. 1464-1480. 6. Some extension of the K-means algorithm for image segmentation and pattern classification, Technical Report 1390. Girosi, J.L. Marroquin and F. 1993. 7. T.M. Martinetz, S.G. Berkovich and K. J. Schulten. Neural-Gas network for vector quantization and its application to time-series prediction. Neural Networks 4. s.l. : IEEE, 1993. 8. J. Bruske, and G. Sommer. Dynamic cell structure learns perfectly topology preserving map. Neural Computation 7. 1995. 9. Fritzke, B. Growing cell structures - a self-organizing network for unsupervised and supervised learning. Neural Networks 7. 1994.

Вам также может понравиться