Rational Polarization Can Be Profound, Persistent, and Predictable

10/3/2020

(2000 words; 9 minute read.)

So far, I’ve laid the foundations for a story of rational polarization. I’ve argued that we have reason to explain polarization through rational mechanisms; showed that ambiguous evidence is necessary to do so; and described an experiment illustrating this possibility.

Today, I’ll conclude the core theoretical argument. I'll give an ambiguous-evidence model of our experiment that both (1) explains the predictable polarization it induces, and (2) shows that such polarization can in principle be profound (both sides end up disagreeing massively) and persistent (neither side is changes their opinion when they discover that they disagree).

With this final piece of the theory in place, we’ll be able to apply it to the empirical mechanisms that drive polarization, and see how the polarizing effects of persuasion, confirmation bias, motivated reasoning, and so on, can all be rationalized by ambiguous evidence.

Recall our polarization experiment.

I flipped a coin, and then showed you a word-completion task: a series of letters and blanks that may or may not be completable by an English word. For example, FO_E_T is completable (hint: where are there lots of trees?); but _AL_W is not (alas, Bernhard Salow is not yet sufficiently famous).

One group—the Headsers—saw a completable string if the coin landed heads; the other group—the Tailsers—saw a completable string if the coin landed tails. As they did this for more and more tasks, their average confidence that the coins landed heads diverged more and more:

Question: what drives this polarization—and why think that it is rational?

Our word-completion task provides an instance of what is sometimes called a “good-case/bad-case asymmetry” in epistemology (Williamson 2000, Lasonen-Aarnio 2015, Salow 2018). The asymmetry is that you get better (less ambiguous) evidence in the “good” case than in the “bad” case—and, therefore, it’s easier to recognize that you’re in the good case (when you are) than to recognize that you’re in the bad case (when you are).

In our experiment, the “good case” is when the letter-string is completable; the “bad case” is when it’s not. The crucial fact is that it’s easier to recognize that a string is completable than to recognize that it’s not. It’s possible to get unambiguous evidence that the letter-string is completable (all you have to do is find a word). But it’s impossible to get unambiguous evidence that it’s not completable.

In particular, what should you think when you don’t find a word? This is some evidence that the string is not completable—but how much? After all, you can’t rule out the possibility that you should find a word, or that you should at least have an inkling that there’s one. More generally, you should be unsure what to make of this evidence: if there is a word, you have more evidence that there is; if there’s not, you have less; but you can’t be sure of which possibility you’re in.

There are a variety of models we can give of your evidence to capture this idea, all of which satisfy the value of evidence and yet lead to predictable polarization (see the Technical Appendix (§5.1) for some variations).

Here's a simple one:

Either there is a word, or there’s not; and either you find one, or you don’t—but you can’t find a word that’s not there, so there are only 3 types of possibilities (the circles in the diagram).

The numbers inside the circles indicate how confident you should be, beforehand, that you’ll end up in those possibilities: you should be ½ confident that there’ll be no word (and you won’t find one), since that’s determined by a coin flip; and if there is a word, there’s some chance (say, ½) you’ll find one, meaning you should be ½*½ = ¼ confident you’ll end up in each of the Word-and-Find (top right) and Word-and-Don’t-Find (bottom right) possibilities.

Meanwhile, the labeled arrows from possibilities represent how confident you should be after you see the task, if in fact you’re in that possibility.

If there's a word and you find one, you should be sure of that—hence the arrow labeled “1” pointing from the top-right possibility to itself. If there’s no word and you don’t find one, you should be somewhat confident of that (say ⅔ probability), but you should leave open that there’s a word that you didn’t find (say, ⅓ probability). But if there is a word and you don’t find one, you should be more confident than that—after all, since there is a word, you’ve received more evidence that there is, even if that evidence is hard to recognize. Any higher number will do, but in this model you should be ⅔ confident there’s a word if there is one and you don't find one.

If you don’t find a word, your evidence is ambiguous because you should be unsure how confident you should be that there’s a word—maybe you should be ⅓ confident; maybe instead you should be ⅔ confident. (In a realistic model there would be many more possibilities, but this simple one illustrates the structural point.)

There are two important facts about this model: (1) it is predictable polarizing, and yet (2) it satisfies the value of evidence.

Why is the evidence predictably polarizing? You start out ½ confident there’ll be a word. But you prior estimate for how confident you should end up is higher than ½. After all, there’s a ½ chance your confidence should go up—perhaps way up, perhaps only somewhat up. Meanwhile, there’s a ½ chance it should go down—but not very far down. Thus, on average, you expect seeing word-completion tasks to provide evidence that there’s a word.

(Precisely: your prior expectation of the posterior rational confidence is ½*⅓ + ¼*⅔ + ¼*1 = 7/12, which is greater than ½.)

Notice that if you had unambiguous evidence—so that the rational confidence was the same at all possibilities wherein you don’t find a word—this model would not be predictably polarizing. (Then your prior expectation would be ¾*⅓ + ¼*1 = ½.)

So what drives the predicable polarization is the ambiguity—in particular, the fact that when you don’t find a word, you should be more confident the string is completable if there is a word than if there’s not.

This, incidentally, is empirically confirmed: in my experiment, amongst tasks in which people didn’t find a word (had confidence <100%), the average confidence when there was no word was 44.6%, while the average confidence when there was a word was 52.3%—a statistically significant difference. (Stats: t(309) = 2.77, one-sided p = 0.003, and d=0.32.)

Why is the evidence valuable, despite being polarizing? Note that the rational posterior degrees of confidence are uniformly more accurate than the prior rational confidence: no matter what possibility is actual, you become uniformly more confident of truths and less confident of falsehoods.

This can be seen by noting that, in each possibility, the probabilities always become more centered on the actual possibility. For example, suppose there’s no word. Then initially you should be ½ confident of this, and afterwards you should be ⅔ confident of it. Conversely, suppose there is a word. Then initially you should be ½ confident of this, but afterwards you should be either ⅔ confident of it (bottom right), or certain of it (top right). And so on.

Because of this, the model satisfies the value of evidence: no matter what decision about the word-completion task you might face, you should prefer to get the evidence before making your decision. (Proof in the Technical Appendix, §5.1.)

(700 words left)

Profound, Persistent Polarization

How, in principle, could this type of evidence lead to profound and persistent polarization?

First note what happens when we divide people into Headsers and Tailsers: we give them symmetric, mirror-image types of evidence. Headsers see completable strings when the coin lands heads; Tailsers see them when it lands tails. Thus Headsers tend to get less ambiguous evidence when the coin lands heads, while Tailsers tend to get less ambiguous evidence when the coin lands tails:

As a result, Headsers should become (on average) more confident in heads, while Tailsers should become (on average) more confident of tails.

(Precisely: although both start out ½ confident of heads, on average Headsers should be 1/6 more confident of heads than Tailsers should be.)

Now consider: what happens if we present each group with a large number of independent word-completion tasks. (For simplicity, imagine they all know that they’re 50% likely to find a word if there is one, so they don’t learn anything new about their abilities as they proceed.)

Each time they’re presented with a word-completion task, they face a question: “On this toss, will I find a word, and will the coin land heads or tails?” Since the coin tosses are each fair and independent, the answer to all of these questions are independent: knowing the answers to some of them has no bearing on the others. Moreover, we’ve seen that with respect to each one of these questions, the evidence is valuable.

In fact, more is true. Let $Q$ be the question "How will each of the coins land?" By iterating this process in the right way, we can make it such that at each stage $i$ of the process, you should expect that the evidence you'll receive about coin $i+1$ is valuable with respect to $Q$. (This, mind you, is the most subtle philosophical and technical step—see the Technical Appendix, §5.2, for more discussion.)

Thus at each time, if what you care about is getting to the truth about how any of the coins landed, you should gather the evidence.

Suppose Headsers and Tailser both do this. Then it will predictably lead to profound and persistent disagreement.

Why? By the weak law of large numbers, everyone can predict with confidence that Headsers should wind up very confident that around 7/12 (≈58%) of the coins landed heads, while Tailsers should wind up very confident that around 5/12 (≈42%) of the coins did.

Now consider the claim:

Mostly-Heads: more than 50% of the coins landed heads.

Everyone can predict, at the outset, that Headsers will become very confident (in fact, with enough tosses, arbitrarily confident) that Mostly-Heads is true, and Tailsers will become very confident it’s false.

Thus we have profound polarization.

Moreover, even after undergoing this polarization, Headsers will still be very confident that Tailsers will be very confident that Mostly-Heads is false; meanwhile, Tailsers will be very confident that Headsers will be very confident that Mostly-Heads is true. As a result, neither group will be surprised—and thus neither group will be moved—when they discover their disagreement.

Thus we have persistent polarization.

In short: the ambiguity-asymmetries induced by the sort of evidence presented in word-completion tasks can be used to lead rational people to be predictably, profoundly, and persistently polarized. (See the Technical Appendix, §5.2, for the formal argument.)

This completes the theoretical argument of this series: the type of polarization we see in politics—polarization that is predictable, profound, and persistent--could be rational.

The rest of the series will make the case that it is rational. In particular, I’ll argue that this ambiguity-asymmetry mechanism plausibly helps explain the empirical mechanisms that drive polarization: persuasion, confirmation bias, motivated reasoning, etc.

It’s not hard to see, in outline, how the story will go.

For “heads” and “tails” substitute bits of evidence for and against a politically contentious claim—say, that racism is systemic. Recall how Becca and I went our separate ways in 2010—I, to a liberal university; she, to a conservative college.

I, in effect, became a Headser: I was exposed to information in a way that made it easier to recognize evidence in favor of systemic racism. She, in effect, became a Tailser: she was exposed to information in a way that made it easier to recognize evidence against systemic racism.

If that were what happened, then both of us could've predicted that we would end up profoundly polarized—as we did. And neither of us should be moved now when we come back and discover our massive disagreements—as we’re not.

And yet: although we each should think that the other is wrong, we should not think that they are less rational, or smart, or balanced than we ourselves are.

That is the schematic story of how our polarized politics could have resulted from rational causes.

In the remainder of this series, I’ll argue that it has.

What next?
If you liked this post, consider signing up for the newsletter, following me on Twitter, or spreading the word.
For the formal details underlying the argument, see the Technical Appendix (§5).
Next post: How confirmation bias results from rationally avoiding ambiguity.

11 Comments

André Martins

10/14/2020 01:26:27 pm

This "(Precisely: your prior expectation of the posterior rational confidence is ½*⅓ + ¼*⅔ + ¼*1 = 7/12, which is greater than ½.)" seems wrong, as I said in the previous post

Let me explain. From the point of view of the observer, there are not 3 options. There are two, either she managed to find the word, or not.

If she did, she knows for sure she is at W. There is a 1/4 chance that will happen. That corresponds to the 1/4 * 1 term. So far, so good.

But, if she does not find a word, she must estimate the chance she is at the case where there is a word (W). For that, you have

P(W|failed) = P(W). P(failed|W)/( P(W). P(failed|W) + P(~W). P(failed|~W) ) = 1/2*1/2(1/2 * 1/2 + 1/2 * 1) =1/3

And there is a 3/4 chance this will happen.

So, the actual final expectation goes as
1/4 * 1 + 3/4 * 1/3 = 1/2
No changes, as it should be. What you found out is a nice bias, so subtle it is easy to get the calculations wrong. But it is not rational.

Kevin

10/16/2020 06:32:12 am

Thanks! It is indeed true that if the person simply conditions on whether or not they find a word, then there will be no expected shift. But what the model is doing is more subtle than that.

Let π be the current rational credence function, and P be the future rational one whatever it is. If P is defined to be recovered from π by conditioning on the true member of the partition {Find, ¬Find}, then indeed E_π[P(word)] = 0.5, for exactly the reason you say.

But P is not defined that way. The technical appendix (§5.1, page 25) gives the full definition of P, and the difference is that in the word-but-no-find case, it P(word-and-no-find) is higher than the ¼ probabiliy that it's given by conditioning on ¬Find. As a result, it is indeed true in the model, as said in the blog post, that E_π[P(word)] = 7/12 > ½ = π(word).

So that's the formal fact. I take it what you're saying, though, is that the model is not a good model of rational transitions; if one were rational, one WOULD simply condition on the true member of {Find, ¬Find}, rather than do the update I specified. Is that right?

What I'd say in response is two things.

(1)First, it's a theorem (Fact 4.7, page 19 in the TA) that whenever evidence is ambiguous, the equality E_π[P(q)] = π(q) will fail for some q. So the only way to claim that you're always irrational if this fails is to say that your evidence is never ambiguous—i.e., you should always be certain that you're rational. Certainly there are some definitions of "rational" (perhaps, "ideally rational") where that seems right; but they aren't the kind we have in mind when asking whether humans are rational—after all, we already know that humans are, in general, not certain that their opinions are rational.

(2) Given that, the question is: on what basis could we insist that the correct update is to condition on the true member of {Find, ¬Find}, as opposed to do the update I say? I don't think there can be a basis on which to do so, because it's *also* a theorem that my update is guaranteed to make you more accurate about every proposition in the model than the {Find/¬Find} update is—it "accuracy-dominates" that update. (This is easy to see, since it's the same in the WF and ¬W¬F possibilities, and more confident in W¬F in the possibility where W¬F is actual.) More generally, the following is true: given a choice between the partitional {Find.¬Find} update, and my P-update, then for every possible decision problem, you expect doing the latter to lead to better decisions than doing the former. (My P-update respect the "value of evidence" wrt the Find/¬Find update; see Definition 4.9, page 21, and Theorem 5.1, page 26).

So I take the point that there's another, weaker way of updating—namely, condition on the true member of the partition {Find/¬Find}—which avoids the predictably polarization. But I don't think there's good grounds for saying that that update is the only rational one to make in this scenario, and in fact, when it comes to accuracy or expected value, the (polarizing) update I propose is strictly better.

Quentin

7/22/2021 04:10:50 pm

I don't understand either the logic of ambiguous evidence and these two layers of probabilities.

It's true that your update is better, but it's based on information that the subject does not have (whether there is an actual completion). At this rate, you could say that an even better update would be: 1 when there's a word and 0 when not, and you're sure to make better decisions.

You say that this update is confirmed empirically, but so it seems that people are sure of what to make of the evidence after all, so it's unambiguous? You didn't ask "what is your confidence that your credence should be X", but only asked for a first order credence after all. And intuitively, what I think is going on (for having done the tasks) is that you have some clue, from linguistic patterns, that there might or might not be a completion for a word, even if you didn't find one. In sum, you have more evidence than just "find/not find".
Not finding a word is ambiguous because the posterior probability does not change to either 0 or 1, but it seems that standard Bayesian inference can account for it? These second order probability distributions remain a bit mysterious to me. I don't see how itc applies to the case of word completion.

Finally, ending up thinking with near certainty in the long run that a fair coin is loaded, while only having been exposed to accurate evidence, seems definitely irrational.

7/23/2021 02:53:42 pm

Hi Quentin,

Fair questions! I think we agree a bit more than you think. It's exactly because people have more evidence than just find/not-find, and the fact that this evidence is hard to pin down and know exactly what it is ("how word-like does this string look?") that I think the evidence is ambiguous.

Granted, it's a hard question to say exactly what further evidence they have, and what is an option and what is not for responding to the string, but there are clear cases (always having credences 0 or 1 is not an option; always just conditioning on find/¬find is). My claim is that there's no good basis to say that you only can respond in a given way, P, if P is higher-order certain. Equivalently, some updates can be rationally feasible even if the subject who follows the update can't be sure that they've done so correctly (in this case, the fact that determines whether they have is tied to whether there's a completion). That puts me on an "externalist" side of an (access) externalist/internalist divide, but the point of the word-completion task is exactly to put pressure on that divide.

Final thought: I totally agree that it certainly seems like something has gone wrong if the person predictably becomes convinced that the coin has landed heads more than half the time. But the interesting formal fact is that this can happen through a series of updates each of which they (rationally) expect to make them more accurate about how the coin lands. It's a form of a diachronic tragedy, in Brian Hedden's sense (https://philpapers.org/rec/HEDOAD), where doing the rational thing at each stage can lead to a predictably suboptimal outcome.

7/23/2021 08:58:22 pm

Thank you for your response (and by the way I really enjoyed reading your posts, I think there's definitely something interesting). I guess what I'm looking for is a clearer understanding of what the two layers of probability of the formal model correspond to. Would you say that the first order probabilities correspond to what an ideally rational agent with illimited computational capacities would infer? For example, if I had enough time to test all the English words I know, and if I could statistically evaluate word patterns, I could compute a probability that the string can be completed, and this would correspond to the ideal rational probability which my second order probability weighs (because I don't actually have illimited capacities)? Now you could say that sometimes, perhaps most of the time, the probability would be 1 (because I would find a word of I had more time). Or is it something else that the first order probability that to?

7/27/2021 12:00:28 pm

So one of the subtle things here is there aren't really two "layers" of probabilities. There's just one type of probabilities—the rational credences, denoted by P. It's just that P is a function whose values vary across worlds (at worlds where there's a word, P(word) is high; at worlds where there's no word, P(word) is low), and P itself can be uncertain which such values obtains.

An analogy to modal logic might help. Let Bp be a modal operator saying an agent believes that p. Give this a standard doxastic-logic Hintikka semantics, where Bp is true at w iff all the accessible worlds from w are p-worlds. That definition makes clear that what you believe varies across worlds. Thus even when Bp is true, it might or might not be true that BBp or ¬BBp. (If all the accessible worlds from w are ones where all the accessible worlds at p-worlds, then BBp is true at w; if w accesses a world x that accesses a world y that's not a p-world, that ¬BBp is true at w, even if Bp is true at w since w accesses only p-worlds.)

If you're curious to hear more, two pieces that try to explain this formalism in the probabilistic case are this one by me (https://philpapers.org/archive/DORHU.pdf) and this one by Tim Williamson (https://philpapers.org/rec/WILVIK).

quentin

8/1/2021 03:19:47 am

Thank you for your response, but allow me to insist a bit. The formal information you give is just formal, it does not really help me understand how probabilities should be interpreted (they can help clarify my question maybe).

So, I understand that in your model, degrees of credence have a structure similar to that of possible worlds, which implies that one can have several orders of probabilities (maybe layer is not the right word) and I don't know how to interpret them in the case of word completion.

Take proba p to mean "I *should* believe with certainty p that..." (with a normative interpretation that you seem to adopt sometimes). Then a second order proba is something like: I should believe that I should believe that... But if "should" is something like ideal rationality with unlimited computational abilities, this interpretation seems wrong for the reasons given in my previous comment. Most of the time the proba would be one, and I presume that for ideally rational agents with unlimited abilities, second order credences would just vanish from the picture.

Or take proba p to mean "I believe with certainty p that...", with a purely descriptive interpretation. In the second order case, I would be uncertain about my own uncertain belief about whether a word can be completed, but I would actually have this first-order belief. Is that what you mean? But what is this (descriptive) first order belief exactly? To what does it correspond in the world? A propensity to act on certain ways maybe?

8/3/2021 01:58:43 pm

Got it, I see more what you're asking now!

I would take "proba p" to mean "I should (given my cognitive abilities) believe with certainty that p". So the sort of normativity I want is neither the credences an ideally rational agent would have, nor just whatever credences you happen to have. In particular, I'd think a fair gloss of "I should be certain that p" (in the relevant sense of "should") is "if I were thinking properly, I would be certain that p".

I think this is the intuitive reading of "should believe" in this circumstance. Take a really easy string, like C_T, and imagine you don't complete it after 7 seconds. Nevertheless, I think it's clear that you SHOULD complete it, and if you don't then you are being irrational. (If I said 'cat' on second 8, you'd think to yourself "Oops! How silly/irrational of me".) Meanwhile, if I give you a very HARD string, like A_Z_ _I_ _ _ _, and you don't complete it within 7 seconds, I don't think you're being irrational—you're just failing to be an ideal agent, since indeed Alzheimer's is a word. Since we intuitively think some failures to find words are irrational, and others aren't, we need some normative notion that plays that intermediate role. That's the one I want my "should believe" talk to glom onto.

8/3/2021 04:22:54 pm

Thanks it's a bit clearer now

8/3/2021 04:39:43 pm

Thanks for your questions! I'm writing up a draft of this stuff now so this sort of thing is super helpful in thinking though the presentation. Will probably change some of it in response!

Sean Drake link

10/30/2022 01:13:21 am

Paper run one indeed yeah military. Set site care.
Figure community lose professor campaign personal.

Stranger Apologies

Rational Polarization Can Be Profound, Persistent, and Predictable

Profound, Persistent Polarization

Leave a Reply.

Kevin Dorst

Archives

Categories