It seems to odd to me that we measure the explanatory power of a regression model in “percent of variance explained”, or \(R^2 = cor(\hat{y},y)^2 = r^2\) even though we all know that variance is just an auxiliary quantityto compute the more meaningful measure of uncertainty which is the standard deviation. Risk in finance or uncertainty in prediction is measured by \(\sigma\), not by \(\sigma^2\). Knowing the reduction in variance in a regression model seems much less useful than the reduction in stdev.
In fact, whenever I try to explain \(R^2\) to my students, I usually start by comparing the overall variation of y (as measured by \(\sigma_y\)) to the remaining variation around the regression line (measured by \(\sigma_{\epsilon}\)). That idea is adapted much more naturally than the comparison of the variances which really have no direct interpretation!
So I propose a new measure which is truly “the amount of standard deviation explained”. We can quickly derive it: \[
R^2 (=r^2) = 1 – \frac{RSS}{TSS} \Leftrightarrow 1 – \frac{\sqrt{RSS}}{\sqrt{TSS}} = 1-\sqrt{1-r^2}
\] where RSS = “residual sum of squares” (\(\approx \sigma_{\epsilon}^2\)) and TSS = “total sum of squares” (\(\approx \sigma_{y}^2\))
Comparing the traditional \(r^2\) with the new measure \(1-\sqrt{1-r^2}\) reveals that substantially stronger correlations \(cor(\hat{y},y)\) are needed to result in similar “uncertainty reduction”. E.g. what one used to call a high value of \(R^2 = 0.8\) explaining 80% of the variance, would have reduced the true uncertainty by merely 55% !
The graph below shows the stronger convexity of this alternative measure.
In his brilliant book The Witches, Roald Dahl gives a probabilistic description of witch features:
How to Recognise a Witch
The next evening, after my grandmother had given me my bath, she took me once again into the living-room for another story. “Tonight,” the old woman said, “I am going to tell you how to recognise a witch when you see one.” “Can you always be sure?” I asked. “No,” she said, “you can’t. And that’s the trouble. But you can make a pretty good guess.” She was dropping cigar ash all over her lap, and I hoped she wasn’t going to catch on fire before she’d told me how to recognise a witch. “In the first place,” she said, “a REAL WITCH is certain always to be wearing gloves when you meet her.”
“Surely not always ,” I said. “What about in the summer when it’s hot?” “Even in the summer,” my grandmother said. “She has to. Do you want to know why?” “Why?” I said. “Because she doesn’t have fingernails. Instead of fingernails, she has thin curvy claws, like a cat, and she wears the gloves to hide them. Mind you, lots of very respectable women wear gloves, especially in winter, so this doesn’t help you very much.” “Mamma used to wear gloves,” I said. “Not in the house,” my grandmother said. “Witches wear gloves even in the house. They only take them off when they go to bed.”
And he goes on with wigs, scratching scalps, nose holes, their eyes, blue spit and the missing toes 🙂 Like an experienced data analyst he points out the power of pooling evidence:
“None of these things is any good on its own,” my grandmother said. “It’s only when you put them all together that they begin to make a little sense.
A splendid narration of this chapter can be found here:
I find the gloves a wonderful example for the natural absence of main effects with a pure interaction term.
This topic continues to receive a lot of attention in the statistical community, e.g.:
(The probability of a non-witch wearing gloves in any season is \((0.15*0.7+0.9*0.3)=0.375\))
#generate data:
#
set.seed(134)
N = 400
x = cbind.data.frame(witch = sample(c(1,0),N,rep=TRUE,p=c(0.1,0.9)), season=factor(sample(c("summer", "winter"),N, rep=TRUE, p = c(0.7,0.3)), levels= c("winter", "summer")), gloves = 0)
w = which(x$witch==1)
x[w,]$gloves = 1
# 15% of normal women wear gloves in summer
ii=which(x$witch==0 & x$season=="summer")
x[ii,]$gloves = sample(c(1,0),length(ii),rep=TRUE,p=c(0.15,0.85))
# 90% of normal women wear gloves in winter
ii=which(x$witch==0 & x$season=="winter")
x[ii,]$gloves = sample(c(1,0),length(ii),rep=TRUE,p=c(0.9,0.1))