Reward_ends_then

EQUINE CLICKER TRAINING.....
using precision and positive reinforcement to teach horses and people

Reward Ends, Then What?

This year at Clicker Expo (2014), they were selling access to on-line videos of some the presentations. Buying one is a great option if you have some time slots where you can't choose which one to attend, or if you want to be able to see a session a few times. Of course, then you still have to decide which one to pick. I chose Eva Bertilsson and Emelie Johnson Vegh’s presentation titled “Reward ends, then what?” Here are my notes. At the end I will share some ideas about how to apply this to horse training.

In this presentation, they were looking specifically at the part of the learning cycle that is between the end of the reward and the beginning of the next behavior. They chose this aspect because it is often overlooked, and they have found that they can significantly improve their training by being more aware of what the dog does immediately after the reward. Of course that’s probably true of any aspect of training, because attention to detail matters, but I think it’s great that experienced trainers are looking at and sharing details about some of the parts that novices tend to overlook.

The reward as part of the learning cycle:

They started by talking about the learning cycle which is one name for the Antecedent -> Behavior -> Consequence -> Antecedent ->… loop that we use to describe behavior. Because it’s a cycle and behavior occurs as a never-ending stream, there is no clear start or end. This means that, depending upon the focus of your training, you can jump in at different places.

Eva and Emelie find it useful to use the reward (reinforcer) as their starting point. They listed some of the reasons they choose to do this:

· What happens after the reward has a lot to do with building focus and enthusiasm.

· Problems can occur if the animal adds in extra behaviors between the reward and the target behavior, because they get reinforced along with the one you want, and you get unwanted chains.

· Starting with a tight reward loop (more on this later) allows you to build clean behavior by weeding out extra behavior, right from the beginning.

· It also forces you to think about what you want to have happen right after the reward. Do you want the animal to immediately repeat what was last clicked, look at you, or do something else? Different situations might require different options and you need to think about this ahead of time.

With novice animals, they recommend that the first and tightest loop you should teach is a really long reward with no breaks where other behavior could creep in. The animal should be so busy getting and eating the reward that this is the only behavior that is happening. An example with a food reward would be having enough food that the animal could eat continuously. Eva gave an example with the giraffes of holding a carrot and just letting the giraffe chew on it. With a toy, you could let the dog tug without interruption. Then you would add some gaps so that there were slight pauses between rewards and then you would start to insert behaviors between rewards. They wrote it out like this:

1. Rrreeewwwwaaaarrrrddd…

2. reward -> reward -> reward -> reward-> …

3. reward -> behavior -> reward -> reward -> behavior -> …

By building the loop in this way, the animal learns to expect the next reward so if you a slight pause after delivering one reward, the animal is going to do some behavior to get the next reward. You are less likely to get an animal that loses interest or gets distracted. The animal will remain focused on you and if you set up your antecedents carefully, you can start to get small bits of clean behavior between rewards.

This is important and goes back to something they mentioned earlier in the talk which was that the reward itself is an antecedent for another behavior. For many dogs, eating a cookie becomes an antecedent for the next behavior, which is often looking at the handler for the next cue. If you want the dog to do this, then it’s great. Eva had examples of times when this works for her. But what if you don’t want the dog to do this? In a shaping session, you might want the dog to just repeat what was last clicked. If the dog always looks at you for information after getting its reward, then you can get unwanted behavior chains that will interfere with the behavior you are trying to shape. So you need to think about getting the reward as a behavior in itself, with its own antecedent and consequence.

If you think about getting the reward as a behavior in itself, then it makes sense to start by just practicing getting the reward and going from reward to reward. The dog is not just eating or playing. It is learning what to do between rewards, which can then be used to teach what to do after the reward. They also pointed out that reward followed by reward builds anticipation and can be used to increase the value of different types of rewards. If the dog likes play but is somewhat indifferent to treats, the trainer can mix up rewards to increase the value of the treats by doing treat -> play -> treat -> play -> play -> treat. If the treat predicts play, then the value of the treat will increase.

Someone in the audience asked about if and when you click for any of this. When they are just giving rewards that the animal likes, they don’t click. But if they are using different rewards to increase the value of one, then a click might be helpful. Eva was training her dog to play with a pull toy by putting chicken in a sock. She would offer a cookie (a preferred treat) and then the sock. If the dog licked or interacted with the sock in an appropriate way (I don’t remember her criteria), she clicked and opened it so the dog could get the chicken. This was one way to build enthusiasm for the sock/pull toy and it was helpful to click to mark the kinds of interactions she wanted.

They talked about a few different things to consider when looking at what happens after the reward. This included rewards given during training sessions, at the end of training sessions, and before "breaks" where you are still in a training session, but not actively training for a few moments.

Ending Training Sessions:

During a training session, each reward is followed by a behavior or another reward. But what about the last reward? Most of us have probably experienced working with an animal that doesn’t want a training session to end. Are there ways to handle this so that the animal is ok with the session ending?

One point they made was that, for the dog, there is no “end," because the behavior you do at the end of a session is going to be the antecedent for another behavior. We may think of it as the end of a training session, but it is not the end of the dog doing behavior. If she ends a session by leaving (as Eva did with the giraffes), her departure is going to be the antecedent for another behavior or behaviors, and these will have both respondent and operant components.

Therefore, you need to think carefully about what you want the animal to do when it gets the “end of session cue.” What behavior would you liked to see? How can you set up the environment so that behavior is more likely? If you are returning the animal to its normal living environment, then it’s important that there are reinforcers available (for the behaviors you want) in that environment.

If Eva is working with the giraffes and ends a session by leaving, she will make sure that there is reinforcement available for appropriate behaviors. The giraffes could have some branches to browse on or other foraging activity. The same thing applies to working with horses. If you take your horse out for a training session, you want to put him back into an environment that includes reinforcement for behaviors that are appropriate for that environment. It could be a pile of hay, access to pasture or friends, etc… With a dog, it might be giving the dog a food toy, kong, bone, or providing some other activity like going for a walk. Emelie’s dog lives in an environment (house with kids) that includes other reinforcers so the dog has something to do when a training session ends.

Part of what makes the above easier is if the dog knows what reinforcement is available for what behavior in each situation. That’s why they said it can be helpful to use different reinforcers for behaviors that are desirable out of training sessions. It just makes things a little clearer to the animal.

The same idea can be applied to training. It can be easier for the dog to transition if you create a physical environment for your training sessions so that you have clearly defined areas for different activities. They like to have a training environment that includes a station and a working area. They will start with the dog on station. This could be a mat or a place (in the car, crate, etc…). The dog uses this as a home base and returns to it between work sessions.

So the session would look like this:

Dog on station
Dog goes to training area – this can be done by cueing the dog to come or by using “transport” which is feeding the dog a cookie while it is walking with you. They call this using a “treat magnet.”
Training interval occurs (session is short and carefully planned)
Dog goes back to station using a treat magnet/transport
This process can be repeated for as many training intervals you want to do.
After the final training interval, the dog is taken to some other location using transport/treat magnet where reinforcement is available (usually something different).

Emelie pointed out that a nice by-product of this structure is that if you take the dog to the training area and it’s not going well, you just transport it back to station and can have some time to regroup. It gives you an easy way to end a session without the dog feeling that the session ended abruptly. If your dog shows reluctance to go to station, then you can insert going to station for very short periods of time so that going to station does not become something the dog wants to avoid.

Choosing a Clever Starting Point:

· Know your goals

· Decide from what starting points your dog should be able to do the behavior (sit from stand is different than sit from down)

· If you want the dog to find the “starting point,” then you need to teach the dog how and when to correct himself

· A session can have the same or multiple starting points.

Knowing your goals includes what you want the behavior to look like and when you want to be able to ask for it. Are you only going to ask for the behavior when the dog is in position or do you want the dog to be able to know how to do a behavior from different starting points? In this context, the starting point refers to where the dog is immediately after the reward is delivered.

Eva had a video showing some heeling practice where she was teaching a dog to find heel position, first from very close to heel position, and then from farther away and at odd angles. She has found that teaching this ability to find heel from many different starting points makes it so that if the dog does get out of heel position (for whatever reason), it knows how get back to it.

The starting point could also be the behavior you want the dog to do immediately after it gets its reward. If you are free shaping, you might want the dog to return to the same starting point after each reward. But if you are working on multiple behaviors or chains, you also have the option of having different starting points. They showed a video of shaping a dog to go under a chair. The reward was delivered so that each time the dog got rewarded, it encouraged a little forward movement along the desired line of travel (under the chair).

They use the term "starting point" a lot and it usually means where the dog is after getting its reward. These examples show that you could choose random starting points (as in the heel example) that take a dog out of position, so that the dog learns to find its way back. Or in the shaping example, you could choose a consistent starting point that sets the dog up to do the next behavior easily Which one you choose depends upon the focus of your training session. They talk more about starting points in the next section.

Working with one tiny slice of at a time:

Another part of choosing a clever starting point is taking behavior and “working with one tiny slice at a time.” When you slice behavior very thin, you have several choices of how to set up the reward to get to the same or different starting points. They listed the following possibilities:

· Forward chaining principle – use the same starting point every time, but stretch the end point

· Backchaining principle – same ending point each time, but stretch the starting point away from the ending point (add more behavior before the end point)

· Fluctuating starting points – starting point changes each time. The example of this was teaching a dog to go under a chair by feeding slightly forward so the dog moves in the desired direction.

· Stay in the same place/position – the starting point is the same as the ending point. The example for this was one was heeling where the dog starts and ends in heel position. They did note that this is one behavior where you want to be careful about building a chain of starting in heel, looking away (or some other unwanted behavior) and return to heel.

The key point here was to really think about where you want the dog to be when you end the reward. Plan ahead. Do you want to use the reward to reposition the dog? To have the dog stay in position? Or to do something else?

When they started to pay more attention to what happens when reward ends, they realized the value of long lasting rewards. Long lasting rewards give you time and/or space. In that time you can move the dog, change something about the environment or give yourself a moment to think. Eva said that if she is working on a behavior and something unexpected happens and she’s not sure what she wants to do next, she can just extend the reward time (by feeding additional cookies or playing longer). She said “If I’m not sure what I want my dog to do after the reward, I don’t end it.”

Using long lasting rewards to reposition the dog is not just about moving from one place to another. Emelie had a nice example showing teaching Scout to go from a down to a sit. She used a cookie to lure the down, and then clicked the the dog as it popped up into the sit. By using food to get the down, Scout didn’t have to think about what she was supposed to do after getting the reward for sitting. If Emelie has fed her in the sit or thrown the food, Scout would then have to be cued to down so she could work on down-> sit. This might have caused some confusion between sit and down.

This was an example of using a long lasting reward to make it easy for the dog to offer the desirable behavior right after the reward and also shows how the use of the reward to reposition the dog can help keep voluntary behaviors clean. They don’t want the dog wondering if it should wait for a cue or just offer a behavior. Emelie did mention that there are other ways to get the dog to the starting point. They sometimes use a recall.

Other aspects of “end of reward = starting point:”

Handler mechanics:

· Are you ready? Are you prepared to act? (ready to do the right thing at the right time)

· Do you know what to do? Have you practiced?

· Is all the gear where it needs to be (clicker, treats, toys, targets, dumbbell, etc…)

· Are your reinforcers ready? In the right pocket or location?

· Are you where you should be? (in position, facing the right direction, …)

· Are you aware of what the dog is using for cues?

They had some interesting examples of dogs using the handler’s body language as cues for behavior, instead of voluntarily offering behavior. This is important because if you think the dog is offering behavior, but it is really cueing off your movement, that can lead to confusion. If you want the dog to be offering a behavior after the reward, you have to check to make sure that is really what is happening. In one case, the dog was not returning to heel position as an offered behavior, but doing it in response to the shoulder shrug of the handler. They had to change the way the handler moved to make sure the dog was truly offering behavior.

Environment:

· Note what’s in the environment when the reward ends

· Distractions can be potential triggers and reinforcers for unwanted behavior

· Avoid making reward end a cue for unwanted behavior

When talking about the environment, they stressed the importance of starting out in an environment where the dog would not be distracted between rewards. It’s not uncommon for a dog to pay attention to the trainer, get his reward, take a moment to look around at what else is going on, and then pay attention to the handler. This pattern can become problematic because the dog becomes conditioned to disengage briefly after the reward. It’s one reason they start with reward following reward before they insert any behavior.

Eva had an example of what can go wrong when not enough attention is paid to what happens after the reward. She was counter-conditioning her border collie to look at her when she saw a car. She would feed a cookie immediately after the dog looked at a car. After a period of time she realized that the dog would intentionally look for a car after getting a cookie so the cookie became a cue to look for a car. Not what she had planned. She had to take the dog and work in a very distraction free environment to break that pattern. If she had to work in a more distracting environment, then she would plan carefully so that the reward ended when it was quieter (no dogs running) or she could use transport to take the dog around a corner or behind something.

The dog’s internal state:

In what “frame of mind” is the dog when the reward ends and he can start working?

· Arousal level

· Emotionality

· Motivation

If the dog is not in the frame of mind that you want, or you expect you are going to have to work when the dog is not in the ideal frame of mind, then you need to teach the dog to work under those conditions. They had a high-energy golden retriever and instead of trying to tone down everything, they used a long lasting treat to give him time to slow down and settle before asking for another behavior. They didn’t try to have him maintain that quieter frame of mind the entire session. They used the type and placement of reward to slow him down and then allowed him to get more energetic. Over time he learned to be just as reliable in the higher energy state as he was in the lower one.

They finished up with a little recap about how reward ending is a cue. The nice thing about a cue is that you can change it. So if you have accidentally trained reward ending to be a cue for an unwanted behavior, it’s just a matter of deciding what you do want and re-training it.

There were tons of good ideas in this lecture and a lot of light bulbs went off in my head about things that have happened in the past (good and bad) and about how this applies to horses. One of the challenges of clicker training with horses is that if you want to get the most up-to-date information on clicker training, you have to be able to look at work that is being done with other species and learn to apply it to horses. This is changing, as more people are exploring clicker training with horses, but I still find that I learn most of my new information from people working with dogs or zoo animals.

With this in mind, what can we take from Eva and Emelie’s presentation? I think there are a few key ideas and some practical applications too. Here are my thoughts:

REWARD ENDS IS A CUE!

I think this is a really significant point. Since watching the video, I have been paying more attention to what my horse does when after she gets her treat. There are two aspects to this. One is noticing if my horse does some unwanted behavior and the other is deciding what behavior I want.

I have made this distinction because I think it’s important to note the unwanted behavior and think a little bit about why it is happening, instead of just training another behavior to replace it. Every behavior has a function so if my horse is looking around or showing signs of stress or frustration after the reward, then I need to address that. It may be that I just need to be clearer about what I want, but it may be that the horse is telling me something about her comfort level in the environment or with that behavior.

One of the examples that Eva used was the dog who couldn’t work in a very distracting environment. There was no point in continuing to train in that location. She found a quieter location where the dog was more likely to get the reward and orient back to her and built the behavior there before asking for it in more challenging situations. It’s important to look at the whole picture.

I’m going to add an extra little point here which is that some of the comments about paying attention to what happens after reward ends also apply to what happens after you click. What do you want your horse to do when you click? What does it mean if you click and your horse immediately looks away because it’s distracted or moves into your space? What is the click a cue for?

If I do decide that it’s just a matter of being clearer in my training, then I need to decide what I want to have happen as reward ends. As Eva pointed out, I don’t necessarily want the same response in every training scenario. In some cases, I want my horse to finish her reward and then orient to me, or at least be waiting for the next cue. But in other cases (free shaping), I want her to immediately repeat what was just clicked.

And don’t forget that there are really three different types of reward ending. There’s reward ending within a session where I am going to ask for another behavior. There’s reward ending between training intervals where I might be taking a short break between training intervals and then there’s reward ending which is the end of the session. You need to think about how to handle/train each of these.

With my horses, I have different routines for different times when I am interacting with them.

·         Around the barn routine: Sometimes I am doing things that are part of our daily routine (feeding, cleaning stalls, taking to and from turnout), they are getting clicked for behavior, but their reinforcement rate is at a level to maintain behavior.   When I leave it’s no big deal because there are usually alternate behaviors they can do and alternate reinforcers available.

·         Formal training session: this is when I take the horse out of its living environment and go to the ring or for a ride and we have a longer training session. When this is over, the horse gets returned to its stall or field and I give a larger reward. I usually dump some treats in a bucket or leave them with something extra.

·         Formal training session in the barn: If I do some training in their stall or field, where we are learning something new or working with a higher rate of reinforcement, then I treat this the same way I do for a session out of the barn, by dumping extra treats in their bucket when I leave. This keeps them busy until I have exited their space and makes it clear that I’m done.

I find that this type of session is the one where the horse is most likely to show unwanted behaviors after reward ends so I am careful to leave the horse with enough treats to keep it busy for a few minutes and I carefully observe what the horse does when it is done eating. If it gets in the habit of coming back over toward me and trying to get my attention so we can do more, then I need to add additional alternate reinforcers to its environment or change something about my routine so that it’s clear that when we’re done, we’re done.

THE VALUE OF LONG LASTING REWARDS:

Most of the time when I reinforce my horse, it’s with a few larger hay pellets or a carrot. When Eva and Emelie first started talking about long lasting rewards, they showed walking with a dog that was licking a treat held in their hand. I just couldn’t picture doing that with a horse. But then when Eva talked about feeding cookie after cookie, I realized we could do this and that I already do this. Sometimes I do it as part of what I think of as a “management solution.” Other times I do it to tighten up a loop and get rid of unwanted behavior. Here are two examples:

Management (or can it be more?):

What if you need to ask your horse to do something and you haven’t been able to train it yet? One option is to just feed the entire time. I used to think of this as kind of a “non-training” solution because the horse doesn’t necessarily learn anything about what behavior you want. The food often works more as a distraction. But now I am thinking about it a bit differently. It’s just reward -> reward -> reward so the horse doesn’t have time to do unwanted behavior. Rather than think of it as an alternative solution, it might be more helpful to think of it as a starting place.

Right after I watched this I had to trim Miss Aurora’s feet. She's only 10 months old and I've only had her for a few months. We have been working on hoof handling, but I knew I didn’t have enough duration or a solid enough behavior to be clicking and treating for trimming. So I had my daughter just feed her for the first foot. By the time we got to the last foot, we were starting to vary the reinforcement just a bit so that she was thinking a little bit about what she was doing and how it was affecting the steady flow of food. We got the job done with minimal fuss and even better, the next time I worked on hoof handling with her, her behavior had improved.

Getting rid of unwanted behaviors:

A lot of horses get in the habit of coming into the person’s space between click and treat. I will confess that with my own horses, I don’t expect them to stand frozen in a head forward position while I get out the food. I am ok with them orienting toward me as long as it’s more of a “oh goody, here it comes” as opposed to “give it to me or I’ll take it” attitude. (nice and technical, I know).

Anyway, if I do have a horse that gets a bit muggy, what I do is feed with the head in the position I want, click for taking the reward (eating a reward is behavior), and then reload that hand with my other hand. This way the horse learns to just stand there eating while I do multiple clicks. It’s a nice way to make it really clear to the horse that coming into my space is not a behavior I want or that he needs to do.

After the horse has been doing this for a while, then I slowly add behavior. So I click and feed, click for eating, then get more food with the same hand, present it and click for eating. The behavior the horse adds is keeping his head forward while I move my hand away. And I can break this down into smaller steps if I need to.

STRUCTURING YOUR SESSION:

The last thing I want to talk about is structuring your session. They shared how they have a training area and a station and how they move the animal from one to the other with transport.

I think this is a useful idea for several reasons. While horses seem to be ok with longer sessions, they still need breaks. There are a few different ways to do this with horses.

· Put the horse back in his stall or field (if it’s accessible and convenient). I do this mostly with novice horses or when I am teaching people. We work a bit, then we put the horse away with some treats and maybe some hay for a few minutes. I can use those minutes to explain what we just did, plan the next training interval and let the horse process. Then we take the horse out and start again. When I start again, I try to observe the horse’s behavior so that starting the session does not reinforce unwanted behavior.

· Take a walk. This is another great way to take a break. If your horse has good basic leading skills, then you can work for a bit, then let the horse walk on a loose rein or lead for a minute or two. It gives you both time to think and process.

· Put the horse on a mat. This is the same idea as having a station. You are going to use the mat as a place for the horse to just relax for a few minutes. If you want to use the mat this way, then it’s important that you have trained relaxed mat behavior so that the mat is truly a place for the horse to just hang out.

· Use an easy exercise that the horse knows well. Alexandra Kurland’s “Stand quietly while the grown-ups are talking” exercise can be used as a break. Again, if you really want it to function as a break, it needs to be trained as a relaxed behavior. I have also used head lowering both standing and in movement as a way to give us both a little time.

Some of these do require that you can get your horse from where you are working to another location (or continue to walk) so your horse does have to have some basic leading skills. Eva and Emelie often use the last reward as a treat magnet to move the dog, but I can’t see doing this with a horse. If the horse did not have good leading skills, you could use targeting or just ask for the behavior, which would work with stationary behaviors like grown-ups and head down, and could also be used for going to the mat if you have that on cue.

Hopefully this article had given you some new ideas about the importance of what happens immediately after you deliver the reward and how you can use that to your advantage. I think that this is one area where unwanted behaviors can sneak in and also that we often don't take advantage of how much food delivery can be used to set the animal up for the next repetition. If you want to read more about food delivery, you might want to read this article: How to Use Different Food Delivery Techniques as Part of Your Training.

Katie Bartlett, 2015 - please do not copy or distribute without my permission

Equine Clicker Training