[pdp-discuss] TD learning in PDP
M Snel
s0675643 at sms.ed.ac.uk
Tue Apr 10 17:54:13 MDT 2007
Hi,
I am trying to construct a model of TD learning in a simple navigation
task. The network should learn to select the optimal action to get to a
goal state; thus, the weights between inputs (encoding location in the
environment) and outputs (encoding navigational actions) should be
updated based on reward.
I have connected the input units to the predicted reward layer, and
clamp the external reward upon reachning goal state. By this
construction the network accurately learns to represent the "value" of
each input unit (i.e. higher expected reward closer to goal). I have
connected the TDlayer to the DaModUnit action units and have turned on
the Da modulation and "p dwt" in those units so that they should learn
from the modulation.
However, results for learning in the actions units are not as I
expected: the action units don't learn to map an input state to a
correct action. I was assuming that in PDP the modulation from the
TDlayer would be "interpreted" by the DaModUnits as feedback on the
PREVIOUS action (as per the "p dwt" parameter). Is this correct? Also,
do the units learn using the Da modulation directly or by a difference
in Da modulations from one timestep to the next?
Thanks,
Matthijs
More information about the PDP-Discuss
mailing list