[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Regarding the update to TD learning



> ftp://grey.colorado.edu/pub/oreilly/misc/lba_rl.tar.gz
>
> just run new_rl_cond.proj.gz with the new version of leabra++ from the
> src tar above.

I've gotten it compiled for CYGWIN, and have my model set up based on this
file... however I'm confused about the new AC/TD layer:

1) Why does it have two numbers on each AC unit? (one is the activation --
the other is a constant offset?) Any docs on the layered AC stuff?
2) What does the new TD unit do? (Just embody the value of delta(t)?)
3) How does the value I put into the AC unit for each event plug into the AC
layer or TD unit?
4) If I wanted to get a measure of predicted reward in the old conditioning
project in chpater 6 of CECN, I just measured the AC unit activation in the
minus phase... how would I do this in the new network? Just sum the AC layer
values?
5) I found the code for the average reward implementation... could you
elaborate on this tiny line of code for me? Is u each unit in the AC
layer... how does avg_dt play into average reward, and how does the td value
figure into the picture? :)

u->act_avg += rew.avg_dt * (u->td - u->act_avg); // update the average

Almost there!
-Robert