[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SOM behaviour (was Re: Seeding the random number generator)





On Tue, 8 Oct 2002, Randall C. O'Reilly wrote:

> I'll leave it to you to figure out what's going on -- I don't think
> its a bug in the code -- lots of people have used this code for
> various projects w/out noticing such a bug.. btw, what I meant by
> hidden layer was the SOM layer..

I think I've sussed it, with help from someone who replied to a usenet
posting. I'm doing some experiments to test this theory empirically.

It's actually quite simple. When the weights of a map unit are updated
after an input is presented, the influence of the initial weights is
diminished somewhat. Normally, the neighbourhood and learning rates
are shrunk quickly enough and the data set is small enough this
doesn't matter.

However in my case:

* I had a 5x5 network with 220,000 inputs.

* I was updating the weights after each input.

* My initial learning rate was 0.9, reducing to 0.75 over 100
  iterations, and zero over 400 iterations.

* The neighbourhood covered the entire map for the first 100
  iterations.

Thus during those first 100 iterations, on each weight update, every
unit was being updated, with the precise amount determined by how far
from the winner it was. But on average that would still be quite high,
the effective learning rate being roughly 50% of the learning rate
used for the winning units. And that involved > 220,000 updates per
epoch...

Thus the influence of the initial weights was diminishing rapidly,
during the period when all units would be responding to every
input. Eventually it drops to below the precision of the computer and
is completely eliminated. At this point, if my theory is correct, the
network converged on the trajectory that would be assumed where the
initial weights are set to zero.

To help in thinking about this, consider what would happen if you set
the learning rate to 1 and updated all units in the map as if they
were the winning unit. This should eliminate the influence of the
initial weights after the first pattern has been presented.

Now start again but set the learning rate to 0.99, and in a sense 99%
of the influence is eliminated on the first step, and 99% of the
remainder on the next.

It should be possible to reproduce this behaviour with any data set
and a sufficiently long training run or high learning rate, so long as
you ensure every unit in the map responds to every input. One can even
relax the last condition by ensuring that every unit responds to every
input at least once every N iterations, but that lengthens the time
required.

James