From allan.randall at ntt.ca Thu Feb 1 09:35:48 2007 From: allan.randall at ntt.ca (allan.randall@ntt.ca) Date: Thu Feb 1 09:35:56 2007 Subject: [pdp-discuss] TD demo and delayed rewards Message-ID: <200702011635.l11GZnP5016601@mail27.atl.registeredsite.com> The simple TD demo in "Explorations" (p 199) is a really nice way of showing the basic algotithm at work. Even the serial compound "cheat" I find is readily accepted when I explain the demo to others. The bigger problem is the fact that the stimulus must be maintained up to the point of reward... this is readily apparent when you show the demo, and makes it look like it is not really detecting delayed reward at all. What would be the easiest way to fix this, without destroying the simple, pedagogically clean presentation of the demo? The active memory model (p 307) gets into numerous other things, and is no longer just a demo of the basic TD algorithm. I'm guessing the problem could be fixed with a simpler modification of the basic TD demo... but maybe that will get me into trouble? Anyone try this? I'm not questioning the value of the algorithm... I understand why the demo makes the compromise it does. I'm just trying to respond to objections that I have heard raised by those watching the demo, without getting into a whole other more complicated demo. Any thoughts would be appreciated. Cheers, Allan Randall, NTT Systems Inc. From Randy.OReilly at colorado.edu Thu Feb 1 11:45:42 2007 From: Randy.OReilly at colorado.edu (Randall C. O'Reilly) Date: Thu Feb 1 11:45:52 2007 Subject: [pdp-discuss] TD demo and delayed rewards In-Reply-To: <200702011635.l11GZnP5016601@mail27.atl.registeredsite.com> References: <200702011635.l11GZnP5016601@mail27.atl.registeredsite.com> Message-ID: <200702011345.42506.Randy.OReilly@colorado.edu> I'm not sure there is a simple solution to this problem. It is clear that in the brain delayed rewards ("trace" conditioning) depends on prefrontal cortex + hippocampus; If you want a reasonable model of these systems, it is going to be a bit more complex than a simple demo.. Our recent "PBWM" (prefrontal cortex basal ganglia working memory) model (O'Reilly & Frank, 2006 -- avail on my Online Papers on my webpage) is the latest incarnation of our thinking about how the PFC solves this problem in conjunction with TD-like learning.. Also see the in press PVLV paper for more details on the TD-like part of it.. - Randy On Thursday 01 February 2007 11:35, allan.randall@ntt.ca wrote: > The simple TD demo in "Explorations" (p 199) is a really nice way of > showing the basic algotithm at work. Even the serial compound "cheat" I > find is readily accepted when I explain the demo to others. The bigger > problem is the fact that the stimulus must be maintained up to the point of > reward... this is readily apparent when you show the demo, and makes it > look like it is not really detecting delayed reward at all. > > What would be the easiest way to fix this, without destroying the simple, > pedagogically clean presentation of the demo? The active memory model (p > 307) gets into numerous other things, and is no longer just a demo of the > basic TD algorithm. > > I'm guessing the problem could be fixed with a simpler modification of the > basic TD demo... but maybe that will get me into trouble? Anyone try this? > > I'm not questioning the value of the algorithm... I understand why the demo > makes the compromise it does. I'm just trying to respond to objections that > I have heard raised by those watching the demo, without getting into a > whole other more complicated demo. > > Any thoughts would be appreciated. > > Cheers, > > Allan Randall, NTT Systems Inc. > > _______________________________________________ > PDP-Discuss mailing list > PDP-Discuss@psych.Colorado.EDU > http://psych.colorado.edu/mailman/listinfo/pdp-discuss From ftmleone at hotmail.com Mon Feb 5 02:26:55 2007 From: ftmleone at hotmail.com (Frank Leoné) Date: Mon Feb 5 02:27:01 2007 Subject: [PDP-discuss] Using input layer for both input and target? Message-ID: Dear all, If I understand the recirculation algorithm, which is encapsulated in Leabra, correctly it should be possible to use in the input layer as target layer. Is this correct? And can this also be done using PDP++ and Leabra? To clarify it a bit: I want to be able to use only an input and hidden layer, reciprocical connected. At timestep 0, the input is fet in. At all other timesteps no input is given, though the network should remember the old input in the activations of the hidden and send it back to the inputlayer. So after the first timestep, I want to only present the old input as target to the input layer. Thanks in advance! with kind regards, Frank _________________________________________________________________ FREE pop-up blocking with the new Windows Live Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ From Randy.OReilly at colorado.edu Mon Feb 5 18:41:53 2007 From: Randy.OReilly at colorado.edu (Randall C. O'Reilly) Date: Mon Feb 5 18:41:50 2007 Subject: [PDP-discuss] Using input layer for both input and target? In-Reply-To: References: Message-ID: <200702052041.53042.Randy.OReilly@colorado.edu> Yep, this should be possible -- I played around with it a while ago. Set the phase_order to either: MINUS_PLUS_NOTHING, // auto-encoder version with final 'nothing' minus phase PLUS_NOTHING, // just the auto-encoder (no initial minus phase) I haven't tested it recently so I can't guarantee it clears out the external inputs in the NOTHING phase properly, but it did at one point.. - Randy On Monday 05 February 2007 04:26, Frank Leon? wrote: > Dear all, > > If I understand the recirculation algorithm, which is encapsulated in > Leabra, correctly it should be possible to use in the input layer as target > layer. Is this correct? And can this also be done using PDP++ and Leabra? > > To clarify it a bit: I want to be able to use only an input and hidden > layer, reciprocical connected. At timestep 0, the input is fet in. At all > other timesteps no input is given, though the network should remember the > old input in the activations of the hidden and send it back to the > inputlayer. So after the first timestep, I want to only present the old > input as target to the input layer. > > Thanks in advance! > > with kind regards, > > Frank > > _________________________________________________________________ > FREE pop-up blocking with the new Windows Live Toolbar - get it now! > http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ > > _______________________________________________ > PDP-Discuss mailing list > PDP-Discuss@psych.Colorado.EDU > http://psych.colorado.edu/mailman/listinfo/pdp-discuss From Randy.OReilly at colorado.edu Mon Feb 5 18:59:01 2007 From: Randy.OReilly at colorado.edu (Randall C. O'Reilly) Date: Mon Feb 5 18:58:58 2007 Subject: [pdp-discuss] Leabra: learning continous values over time? In-Reply-To: References: Message-ID: <200702052059.01762.Randy.OReilly@colorado.edu> Frank, The parameters on the scalar val layer are important (as you discovered). I did a fair bit of testing with special projects that exercise the system pretty well, avail at: ftp://grey.colorado.edu/pub/oreilly/misc/gaus22_scalar_test_01.proj.gz and loc_scalar_test_05.proj.gz both use the wt_sig = gain,off = 1.0, 1.0, as you discovered, and perhaps some other special params that might want to check out. the loc guy is used for representing 0-1 values with only 3 units (0, .5, 1), which we use in our PVLV model. As for the nothing question: excellent point and currently it is not supported. need to add that functionality. given the way things work, with only a floating point number to go on, I guess we would just have the spec have a magic number that you enter that is the signal for nothing, and when the input has that, it does that. I will put this in pdp++ 4.0 -- not sure I'll back-port it but could if you really want it now.. - Randy On Wednesday 31 January 2007 11:52, Frank Leon? wrote: > Hi all, > > Again I want to expres my gratitude: it now works! Especially this, for me > previous undiscovered, layertype really opens new interesting > possibilities, I like it! > > It does not work flawlessly though: the error first goes down quite fast, > but after some time rises even faster, to really high levels, just to go > down again (not as far down as it went previously) and to rise later on, > etcetera. Needles to say, it is quite hard to reach low error levels with > this kind of behaviour. Does anyone got an idea what the reason might be? > I'm using a leabra network with a context layer (made with the wizard). > > Also I have one other minor problem: for the learning over time I want the > retinal input to switch off after the first trial. So, the first trial the > input is a value between a minimum and a maximum, converted in a > distributed code. But in the rest of the trials no such input should be > given: the input layer should give a zero input. But if I enter zero as > input, it is converted to the center of my distributed code, which is > ofcourse normally the correct behavior. So I thought: lets take a number > far outside the borders, but that point is just moved to the border itself. > > So: how can I input nothing into a scalarvallayer? I now fixed it by just > clamping the entire pattern, but it would be nice to use the scalarvallayer > again. > > Thanks in advance! > > Frank > > PS. I can't seem to reach the latest archives. Is there something wrong or > is it just me? :) > > _________________________________________________________________ > Hotmail en Messenger on the move http://www.msn.nl/services > > _______________________________________________ > PDP-Discuss mailing list > PDP-Discuss@psych.Colorado.EDU > http://psych.colorado.edu/mailman/listinfo/pdp-discuss