You don't understand negative reinforcement

by Dorian Minors

October 28, 2015

Analects  |  Newsletter

Excerpt:

The Skinner Box. A terrifying name for a wild ride in the research into automatic behaviour.

Let’s start at the beginning though. We’ve talked before about how when one thing predictab…

Mushrooms change the balance between inside-out forces (the all-consuming neural networks that support the ‘self’) and outside-in forces (the environment and world around us). This model seems most useful in explaining the mushroom experience.

filed under:

Unfiled: this is an archived article from our predecessor website, The Dirt Psychology.

Article Status: Complete (for now).

The Skinner Box. A terrifying name for a wild ride in the research into automatic behaviour.

Let’s start at the beginning though. We’ve talked before about how when one thing predictably precedes another, we tend to associate the two things. It’s a kind of learning, and it’s the reason you salivate when I describe a juicy lemon or the reason you might think of your old school swimming carnivals when you smell chlorine.

What we are learning there are the consequences of certain events in the world. But we don’t just care about the consequences of events. We also care about the consequences of our behaviours.

To explore this idea, Edward Thorndike was playing with cats. He was putting them in puzzle boxes and seeing how they got out. He noticed that at first the cat would struggle, and anxiously zip around the box engaging in more or less random behaviour. But as soon as the cat managed to figure out how to open the door of the box, it would cease all other kinds of behaviour when put back into the puzzle box and do the action it last did to get out.

Thorndike elaborated on what he saw here by noting that when a behaviour is followed by a good outcome, it will happen more often; when a behaviour is followed by less good outcomes, it will happen less often. He called it the law of effect. This is what psychologists might refer to as the ABC of behaviour. Antecedent (what happens first), Behaviour (what we do in response), and Consequence (what follows that either makes us want to behave that way or behave differently in response to the next antecedent).

The birth of reward and punishment

Enter Burrhus Frederic Skinner. Skinner was all about Thorndike’s law of effect and decided to test the crap out of it. And thus came into being the Skinner Box. Skinner would put an animal in a box and have one action result in a reward or a punishment. For example, a pigeon could peck a certain spot and food would be delivered, or a rat could press a lever and it might get an electric shock.

From his studies, he developed (and we have since massively expanded upon) the concept of operant conditioning. This means, quite literally, how acting/operating on the environment results in differences in behaviour. Or, as you might more commonly know it, the principles of reward and punishment.

Essentially, this is about how we learn the consequences of our actions. How does the world react when we do certain things? What can we do to make the world better for us, or worse?

The four kinds of behaviour modification

There are four words we need to know when it comes to operant conditioning, positive, negative, punishment and reinforcement (also known, erroneously, as reward):

  • a reinforcement, very simply, is anything that will encourage a behaviour to happen again (that will reinforce a behaviour)
  • a punishment, very simply, is anything that will discourage a behaviour from happening again (i.e. will punish the behaviour)

And each of these has two kinds:

  • positive, meaning that the punishment or the reinforcement is presented; and
  • negative, meaning that the punishment or the reinforcement is removed or taken away.

Let’s walk through them.

Positive reinforcement

Here, you’re presenting something to encourage a behaviour to occur again. So if I give you (present you with) a chocolate for saying something nice about me, you’re more likely to compliment me in the future.

Positive punishment

Here, you’re presenting something to encourage a behaviour not to occur again. So if I yell at you for insulting me, you’re less likely to tease me in the future. I’m presenting the yelling to you, to punish the behaviour.

Negative punishment

Here, you’re removing something to encourage a behaviour not to occur again. If I take your phone off you because you’re texting while I’m talking to you, you’re less likely to text while we chat in the future. I’m removing the phone, so you’ll pay attention to me.

Negative reinforcement

This is the most counterintuitive. Here, you’re removing something to encourage a behaviour to occur again. So, for example, if you turn the fan on when it’s hot, you’re taking the heat away. The next time it’s hot you’re more likely to turn the fan on. If you’re hungry, and you go eat, you’re taking the hunger away. You’re more likely to eat when you get hungry in the future. This seems straightforward, but you might have noticed that when you’re bored you get up and check the fridge every few minutes. This is because we don’t ordinarily notice the early stages of hunger. If you’re unfocused, you do and you’re encouraged to eat. This kind of behaviour can be problematic.

Another excellent example of negative reinforcement is procrastination. When you procrastinate, you’re taking away the stress of whatever it is you’re avoiding doing. You forget about it, and you feel better. What are you more likely to do next time you’re stressed about something? Procrastinate. Not idea.

Confused? Well, luckily I brought everyone’s favourite sociopath, Sheldon, along to help explain: [embed]https://www.youtube.com/watch?v=Mt4N9GSBoMI[/embed]

Some caveats

We will only learn if the connection between the operant (the reinforcement or the punishment) and the response is made very clear. In this case we would call the operant a discriminative stimulus. The reinforcement or the punishment makes it easy to discriminate between behaviours that are important.

However, even if there is a clear connection, we can tend to generalise, and things that are similar to the discriminative stimulus will start to elicit the specific response too. If we learn to respond to a red light, we might start responding to an orange one too after a while. Or a red piece of paper. And so on.

Finally, the more complicated the response desired (and the stupider the thing we’re trying to control) is, the more difficult it is to influence the behaviour. So when Thorndike was putting the cats in the puzzle rooms, if they had to engage in a complicated behaviour to unlock the door it would take a long time for them to learn it. In these cases, we can use what’s called shaping, in which we encourage little successive approximations of the behaviour we’re trying to elicit until eventually they do the whole thing at once. For example, you don’t just teach a dog to roll over. First, you train it to sit. Then to stay. Then to roll over. Successively approximate the behaviour.

It really is like that (almost certainly misquoted) quote from Confucius:

Life is really simple, but we insist on making it complicated

Or perhaps, more accurately, people are really simple and we wish we were more complicated.

Don’t like how simple your brain can be? Learn how to boost it’s connectivity through the right kind of meditation (seriously, it’s science). Or learn the equally simple reason why all groups tend to get stale after a while (and how to avoid it). Giving you the dirt on your search for understanding, psychological freedom and ‘the good life’ at The Dirt Psychology.


Ideologies worth choosing at btrmt.

Join over 2000 of us. Get the newsletter.