[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regarding the update to TD learning



In general, you can use the HelpConfig buttons on the layerspecs to
find out info about how these more complex layers work..

"Robert Olivares" <robert.olivares@vanderbilt.edu> writes:
> 1) Why does it have two numbers on each AC unit? (one is the activation --
> the other is a constant offset?) Any docs on the layered AC stuff?

one is the name, which tells you the central value represented by that
unit (it uses a coarse-coding scheme to represent a single graded
value). 

> 2) What does the new TD unit do? (Just embody the value of delta(t)?)

yep

> 3) How does the value I put into the AC unit for each event plug into the AC
> layer or TD unit?

the first unit of the AC layer represents the actual value decoded
from the coarse-coded representation over the subsequent units -- it
is what is clamped and what you should read out.

> 4) If I wanted to get a measure of predicted reward in the old conditioning
> project in chpater 6 of CECN, I just measured the AC unit activation in the
> minus phase... how would I do this in the new network? Just sum the AC layer
> values?

Just measure the minus phase activation of the first AC unit..

> 5) I found the code for the average reward implementation... could you
> elaborate on this tiny line of code for me? Is u each unit in the AC
> layer... how does avg_dt play into average reward, and how does the td value
> figure into the picture? :)
> 
> u->act_avg += rew.avg_dt * (u->td - u->act_avg); // update the average

again, everything is computed on the 1st unit, u is the 1st unit, then
this value is clamped onto a distributed coarse-code for that value on
the remainder of the units.  td is a bit confusing -- it is the
variable used to hold the reward delivered to the unit.

  float rval = u->td;		// reward value
  if(rew.sub_avg) {
    rval -= u->act_avg;   // <- here is where the avg is used!
    u->act_avg += rew.avg_dt * (u->td - u->act_avg); // update the average
  }
  u->ext = (rew.discount * u->act_eq) + rew.gain * rval;

				- Randy