[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Regarding the update to TD learning
In general, you can use the HelpConfig buttons on the layerspecs to
find out info about how these more complex layers work..
"Robert Olivares" <robert.olivares@vanderbilt.edu> writes:
> 1) Why does it have two numbers on each AC unit? (one is the activation --
> the other is a constant offset?) Any docs on the layered AC stuff?
one is the name, which tells you the central value represented by that
unit (it uses a coarse-coding scheme to represent a single graded
value).
> 2) What does the new TD unit do? (Just embody the value of delta(t)?)
yep
> 3) How does the value I put into the AC unit for each event plug into the AC
> layer or TD unit?
the first unit of the AC layer represents the actual value decoded
from the coarse-coded representation over the subsequent units -- it
is what is clamped and what you should read out.
> 4) If I wanted to get a measure of predicted reward in the old conditioning
> project in chpater 6 of CECN, I just measured the AC unit activation in the
> minus phase... how would I do this in the new network? Just sum the AC layer
> values?
Just measure the minus phase activation of the first AC unit..
> 5) I found the code for the average reward implementation... could you
> elaborate on this tiny line of code for me? Is u each unit in the AC
> layer... how does avg_dt play into average reward, and how does the td value
> figure into the picture? :)
>
> u->act_avg += rew.avg_dt * (u->td - u->act_avg); // update the average
again, everything is computed on the 1st unit, u is the 1st unit, then
this value is clamped onto a distributed coarse-code for that value on
the remainder of the units. td is a bit confusing -- it is the
variable used to hold the reward delivered to the unit.
float rval = u->td; // reward value
if(rew.sub_avg) {
rval -= u->act_avg; // <- here is where the avg is used!
u->act_avg += rew.avg_dt * (u->td - u->act_avg); // update the average
}
u->ext = (rew.discount * u->act_eq) + rew.gain * rval;
- Randy