[pdp-discuss] TD learning in PDP

M Snel s0675643 at sms.ed.ac.uk
Tue Apr 10 17:54:13 MDT 2007


Hi,

I am trying to construct a model of TD learning in a simple navigation
task. The network should learn to select the optimal action to get to a 
goal state; thus, the weights between inputs (encoding location in the 
environment) and outputs (encoding navigational actions) should be 
updated based on reward.

I have connected the input units to the predicted reward layer, and 
clamp the external reward upon reachning goal state. By this 
construction the network accurately learns to represent the "value" of 
each input unit (i.e. higher expected reward closer to goal). I have 
connected the TDlayer to the DaModUnit action units and have turned on 
the Da modulation and "p dwt" in those units so that they should learn 
from the modulation.

However, results for learning in the actions units are not as I 
expected: the action units don't learn to map an input state to a 
correct action. I was assuming that in PDP the modulation from the 
TDlayer would be "interpreted" by the DaModUnits as feedback on the 
PREVIOUS action (as per the "p dwt" parameter). Is this correct? Also, 
do the units learn using the Da modulation directly or by a difference 
in Da modulations from one timestep to the next?

Thanks,
Matthijs



More information about the PDP-Discuss mailing list