Mental Bayesian Updating

How to use ratios to make Bayesian updating in your head easier.
Posted on Aug 9, 2020

This is a short-ish note explaining an interesting trick to make Bayesian updating easier when you don’t have a pen and paper. The idea certainly isn’t mine and there are a few other explainer-articles about it. I can’t remember where I learned this myself. Generally, little emphasis is placed on this way of performing Bayes’ Rule, which I think is a shame as it’s way easier. This is partly because odds are a convenient and often more intuitive way of thinking about probability.

Bayes’ Rule

For the unfamiliar, Bayes’ Rule is a mathematical formula that tells you how to update your beliefs when you encounter new evidence. It can be expressed in the following two (equivalent) ways:
$$P(A|E) = \frac{P(E|A) P(A)}{P(E|A) P(A) + P(E|\bar{A}) P(\bar{A})}$$

$$P(A|E) = \frac{P(E|A) P(A)}{P(E)}$$

In these formulae, A is the event we care about, E is the evidence, P(X) is the probability of X happening, P() = 1 − P(X) is the probability of X not happening, and P(X|Y) means the probability of X given that Y has happened.

Anyway, this rule is extremely useful and there’s literally an entire subfield of statistics and computer science devoted to Bayesian updating and statistical inference using this rule.

Example - D20

Let’s look at an example. Say our friend is rolling a D20 and we want to predict the chance that the value she rolls is even (this is our A). Our prior probability for this event (the chance that we believe it has to occur before we see any evidence) is, of course $\frac{1}{2}$ (this is P(A)).

Our friend then tells us that the roll was single-digit (i.e. between 1 and 9 inclusive). How do we update our belief? Well, we use Bayes’ Rule. We need to know P(E|A) in whichever formulation we use (and let’s use the second to keep things simple). This is the probability of the roll being single-digit, given that it’s even. It turns out that 4 of the 10 possible even values are single-digit (2, 4, 6, and 8). P(E), the probability of a D20 roll being a single digit, is clearly $\frac{9}{20}$. So we have:
$$P(A|E) = \frac{P(E|A) P(A)}{P(E)} = \frac{\frac{4}{10}\cdot \frac{1}{2}}{\frac{9}{20}}=\frac{4}{9}$$

And this is what we expect: there are nine possible single-digit numbers, four of which are even.

Example - Disease

Suppose 1 in 10,000 people have some rare disease. If you have this disease, then you always sneeze musically. Suppose 1 in 1,000 people who don’t have this disease also sneeze musically. You sneeze musically - what is the probability you have the disease?

We’ll denote musical sneezing as S, and having the disease as D. We clearly want to work out P(D|S). Using the first formulation this time:
$$P(D|S) = \frac{P(S|D) P(D)}{P(S|D) P(D) + P(S|\bar{D}) P(\bar{D})}$$

$$ = \frac{(1 \cdot \frac{1}{10,000})}{(1 \cdot \frac{1}{10,000}) + (\frac{1}{1,000} \cdot \frac{9,999}{10,000})}$$

$$=\frac{1}{1 + \frac{9,999}{1,000}} = \frac{1}{10.999} \approx 9.1\%$$

This result might seem surprisingly low, but it should make some sense who you think about it. Musical sneezing is about ten times more common than the disease, so while having the symptom should drastically increase your belief that you have the disease, it’s still far more likely than not that you don’t.

Using Ratios

We can do calculations in our head a bit more easily by using odds-ratios. Say we have a prior probability expressed as a ratio x : y (this means that $\frac{x}{x+y} = P(A)$ - like bookmakers’ odds). It turns out we can update this ratio when we see new evidence by multiplying it through by P(E|A) : P(E|), which is a bit easier to do in your head (I think).

Given our D20 example from earlier, we have our prior probability as a simple 1 : 1 ratio (evens, if you’ll pardon the pun). We just multiply this through by $P(E|A) : P(E|\bar{A}) = \frac{4}{10} : \frac{5}{10} = 4 : 5$, and of course 4 : 5 × 1 : 1 = 4 : 5. This, expressed as a probability, is $\frac{4}{4 + 5} = \frac{4}{9}$, as before.

What about the disease example? Well our prior belief that we have the disease (before discovering that we can musically sneeze), is 1 : 9, 999. We update this by multiplying it by the ratio $P(S|D) : P(S|\bar{D}) = 1 : \frac{1}{1,000}$, giving 1 : 9.9, corresponding to the probability $\frac{1}{10.9} \approx 9.1\%$, as expected.

Proof

It’s very easy to show that the two are equivalent. If our prior odds-ratio is x : y, and our posterior (after the update) is x′ : y, then we want to show that:
$$\frac{x'}{x' + y'} = \frac{P(E|A) \cdot \frac{x}{x + y}}{(P(E|A) \cdot \frac{x}{x + y}) + (P(E|\bar{A}) \cdot \frac{y}{x+y})}$$
Remember that the way we’ve defined our ratio updating, we have x′ = x ⋅ P(E|A) and y′ = y ⋅ P(E|), so:
$$\frac{x'}{x' + y'} = \frac{x \cdot P(E|A)}{x \cdot P(E|A) + y \cdot P(E|\bar{A})}$$
This is clearly the same as the previous statement after you multiply top and bottom through by x + y, so it works.

Multiple Events

Another reason that the odds form of Bayes Rule is far better is because it’s easily extended to multiple events. Say we have three possible (mutually exclusive) events A1, A2, and A3. We can update our prior beliefs about their relative likelihoods using ratios, which is harder using the well-known form. We simply multiply through the ratio P(A1) : P(A2) : P(A3) by P(E|A1) : P(E|A2) : P(E|A3) to obtain updated odds-ratios.

Yet another brilliant thing about it is that there could exist a fourth event A4, mutually exclusive to our other events, and we could fail to include it in our calculations and still have our posterior ratio be an accurate ratio of the probabilities of A1, A2, and A3 (provided that the ratio we’re multiplying by isn’t 0 : 0 : 0, which could happen)! I won’t prove either of these things here, but they are both true.

Conclusion

The odds form of Bayes Rule is, I think, more intuitive. It’s just a pair of multiplications that you can probably do in your head rather than a division that probably involves fractions. This should really be the default way that people think about Bayes’ Rule, but sadly it isn’t the way it’s taught. The easy application to more than two events is also extremely convenient.