Naked Statistics by Charles Wheelan

Rating: 9/10

Best line: Statistics alone cannot prove anything; instead, we use statistical inference to accept or reject explanations on the basis of their relative likelihood.

Next best line: Probability doesn’t make mistakes; people using probability make mistakes.

The effects of statistics, as a discipline, are completely unavoidable today. I can’t think of any other facet of mathematical science that has permeated day-to-day life in the same way. As a consumer, everything we see—never mind what we actually buy—is driven by statistics.

There are concerns with all of this and yet there is an undeniable convenience, too. The cost is the loss of authentic connection. A click online instead of a exchange in-person. The benefit is infinite selection at ever-decreasing costs. It isn’t going to change any time soon. We are a number despite the protestations of an old Bob Seger song.  

Impersonal or not, we shouldn’t ignore the fact that this stuff (statistics) works. In fact, I think it is the cornerstone skill that separates the pros from the poseurs in the realms of risk, service delivery, management, decision-making, psychology, and strategy. Oh, and healthcare. And education. And everything.

So we need to understand it. Not the formulas so much. The concepts. This wonderful book by Charles Wheelan is the best place to start. In this review, we’ll cover some of the core concepts that the book does a marvelous job of explaining. It will be a terrible substitute for the book itself, though. As you peruse, when something really resonates, just know that it’s a sign you should purchase a copy. I’m certain of it. So certain that my confidence interval is 95%.

Lies, Damn Lies, and Statistics

Sum, mean, median, percent, percent change, standard deviation. These and other fundamental measures are the stuff of standard descriptive statistics. Anyone who’s messed with an excel spreadsheet knows to throw these measures into whatever table you’re analyzing. These are the elements of information that tell a story and that story leads to knowledge. The only problem is that we can easily warp these elements to tell just about any story we want.

They say it was Mark Twain who came up with the lies-and-statistics quote. Sounds about right. But why is it right? What makes a statistic feel more like a lie has everything to do with the slippery nature of truth. The author illustrates this nicely with a simple example of percent change vs change in percentage points.

Imagine the following … one political party proposes to raise the income tax from 3% to 5% to help fund schools. They say it isn’t a big deal; it is merely an increase of 2 percentage points.

The opposing political party say it’s a terrible idea. Why? Because the tax increase would raise the tax rate by 67%. It’s exorbitant!

Which is right? Statistical statements such as these are difficult because they are both factual, true statements. One just happens to be more experientially true. If a tax was increased from 3% to 5%, you would experience this as a 67% increase. Not as a 2 percentage-point increase. But again, that doesn’t take anything away from the truth of both statements.

They just so happen to be completely different measures. This is the power that statistics provides. It is the most flexible of mathematics, easily woven into any and all justifications regardless of motive. In fact, motive often defines the statistic of choice. We pick and choose on a whim.

The lesson here is that one must define the values and verbiage of a effort before engaging in its measurement. What is the best measure to reflect the impact of a tax increase? Absolute terms, such as point increase, or relative terms such as percent change? The answer defines your method of measurement.

But that’s not the end! Once a method of measurement is determined, it is critical to establish the unit of analysis. Here’s another example from the author:

Imagine we want to assess the performance of our schools (are they getting better or worse?). Based on the values and verbiage that surround the conversation, we decide our best method of measurement: test scores.

Makes sense! But when we examine the test scores, do we measure the school as a whole or do we measure the students?

It’s a real dillemma that is better explained from the author’s own (paraphrased) words:

One analyst decides to assess schools and proclaims “Our schools are getting worse! Sixty percent of our schools had lower test scores this year than last year.”

Another analyst assesses students and proclaims: “Our schools are getting better! Eighty percent of our students had higher test scores this year than last year.”

Again, both analysts are factually correct. But their unit of analysis is different. Schools, as a whole, could indeed suffer lower test scores. Meanwhile, the student body could be primarily housed in one high-performing school where test scores are skyrocketing. So what matters more? The majority of schools or the majority of students? The answer determines the accurate depiction. The statistics just provide the data.

Which is to say that statistics fools us into equating obvious precision for apparent accuracy.

This is made even worse by the fact that 57% of statistics are entirely made up.

Right?

Well, no, that’s not right. That number, 57%, was made up. But it feels specific, thus precise, thus accurate. This is better explained by the author:

“Precision reflects the exactitude with which we can express something.”

“Accuracy is a measure of whether a figure is broadly consistent with the truth.”

“If an answer is accurate, then more precision is usually better. But no amount of precision can make up for inaccuracy.”

This distinction is so important it effectively serves as a mental model. I’ll write more about precision versus accuracy in future articles but, until then, here’s a graphic to help illustrate the idea. I think it’s one of the best:

From Mr. Evan’s Science Website.

 

As you can see, statistics is a powerful tool so long as it’s pointed in the right direction. If you don’t know the direction, the truth, the agreed-upon language and context, you’ll get a fantastically precise and wildly inaccurate depiction of reality. This doesn’t make for lies, per se, but it does lead to mistaken assumptions that are nonetheless damaging.

I’m Not Wrong; I Just Failed To Disprove

There is a deeply profound, extremely useful concept in the Scottish court system that does well to tangentially illustrate another important concept in statistics. It is a type of verdict called “Not proven” or, more colloquially, “The Scottish Verdict.” As one of three potential verdicts in the courts, the Scottish verdict provides an important distinction between “guilty” and “not guilty”, a useful middle-ground where there is some hint of suspicion still present in the jury’s mind, a sense that the defendant might be guilty but an acknowledgement that the guilt isn’t proven.

This is effectively the same idea as the null hypothesis. As an extension of statistical inference, the null hypothesis is what can help us be an analytical thinker.

But let’s begin with inference. As Wheelan explains, Statistical inference is really just the marriage [of] data and probability.

Whole chapters are dedicated to this so I’ll have to explain it a bit more poorly with the null hypothesis as my anchor. It begins with the word “hypothesis.” We all have ideas how the world works. With some data and some statistical tools, we can test those ideas or, rather, those hypotheses to see what holds true. As explained:

Any statistical inference begins with an implicit or explicit null hypothesis. This is our starting assumption, which will be rejected or not on the basis on subsequent statistical analysis. If we reject the null hypothesis, then we typically accept some alternative hypothesis that is more consistent with the data observed.

For example, in the court of law the starting assumption, or null hypothesis, is that the defendant is innocent.  

Let’s say we determine that there is ironclad evidence to deem the defendant guilty. If that’s the case, the null hypothesis (that the defendant is innocent) is rejected. The alternative hypothesis, guilty, is true.

I find this very powerful because it recognizes that there are two parts to the usual Eureka! moment of discovery when exploring data. The first part is the rejection of the null hypothesis; the second part is the acceptance of a new alternative. What we knew is wrong and this is what is right instead.

But what about those instances where the null hypothesis can’t be rejected? That’s when the concept is its most powerful. A trial, experiment, or statistical sample might render a result that “fails to reject” the null hypothesis. That doesn’t mean the null hypothesis is proven. Only that the trial failed to reject it. This shows that statistics doesn’t prove anything. It merely informs judgement. If a trial is unsuccessful, you face a decision: cut your losses or run a new trial. It isn’t like calculus. It isn’t black-and-white. The null hypothesis isn’t a wrong answer. It is merely proof that the data doesn’t do enough to reject the default condition.

But what does the phrase “do enough” mean? Can we get any precision on that? This is where we enter the land of significance levels and confidence intervals. This is also where we return to the concept of the Scottish Verdict. There are regular moments in statistical analysis where the signal is just barely enough to show a glimmer of new information. A drug trial has mixed results that lean positive, pointing to the potential of rejecting a null hypothesis. How positive must the results be? This is determined by pre-ordained values such as significance levels and confidence intervals.

A common significance level (expressed as a p-value) is .05. What does that number mean? It is a test to the quality of the sample used to test the hypothesis. There is a chance that, within a sample, you might inadvertently select the most extreme data points available. It’s rare that this happens but doing so would greatly skew the results. So a simple way of thinking about significance levels is that, at .05, it states there is less than a 5% chance of getting an outcome, through the data, that is at least as extreme as what would be observed if a null hypothesis were true. Or rather, a significance level of 0.5 would show that 95% of all samples would yield the same results reported by the analysis.

Confused? Me too! And it’s another plug for the book since it can explain the idea so much better. I’ll forgo the talk of confidence intervals but, suffice to say, it is a vital aspect of proving that your data is really saying what you think it’s saying.

This is the major point of this section. Great statistics is about not only discovering patterns in data but also testing the data to ensure that it is indicating the proper inference for these patterns. If significance levels are not at a proper threshold, it indicates that there is a low probability that the data is as accurate as it appears to be. Through that probability, we get the following concept: statistics cannot be used to prove a hypothesis as simply true or false, emphatically, but rather true or false within a certain level of probability. Judgement must then come into play.

So let’s try another example to illustrate the core concept: imagine you are sick and you have a pain in your side. Doctors run an analysis of key health indicators and inform you that there is a 85% chance that you are suffering from cancer. The null hypothesis is that you don’t have cancer. The alternative hypothesis is that you do.

At 85% probability, the data isn’t strong enough to emphatically reject the null hypothesis. There isn’t enough to prove that you don’t have cancer. The probability isn’t 100%. There is only a signal. A high likelihood, at 85%, but is it high enough? Do you then commission a surgery?

Perhaps so. Perhaps not. If not, the reality isn’t that the evidence failed to reject the null hypothesis. The alternative hypothesis that you have cancer is simply, as the Scottish Verdict would declare, “not proven”. Nonetheless, you have a decision to make and the data merely helps understand the probable odds. Plenty of people still go against those odds regardless of the situation. Statistics simply helps understand those odds. It is the best way to dance with uncertainty.

There is potential for error in all this analysis, of course. With probability playing such a key role in analysis, there are going to be times that something with a very high significance level giving us 95% certainty still turns out to be wrong. When this occurs, it yields false findings known as “false positives” or “false negatives”.

The art of the analysis lies in understanding which error is tolerable and which isn’t. This depends entirely on the circumstance of your study. Type 1 error, where you accidentally do an analysis that yields “false positives”, can be dangerous. “We ran the test and thought you had cancer (false positive) but you didn’t.” But a Type 2 error in this circumstance is far worse. “We ran the test and didn’t think you had cancer (false negative) but you did.”

What is the acceptable error? It is based on your upside and downside risk profile. Good practice is to avoid downside risk wherever possible knowing that, when facing downside risk, you’re rather “not be wrong than be right”.

Intuitively, we practice this a lot in uncertain situations but the formal language can make it hard to understand. But the formal language also provides us some clarity, gives us the necessary models to think about it. So perhaps this serves as another plug to buy the book.

It Was Probably The Probabilities … But I Regress

Finally, there is no way one can ever talk about a statistics book without touching on regression analysis. This is the the technique we use to find patterns in large data. Looking for a relationship between your baseball team’s slugging percentage and offensive scoring against left-handed pitching? Run a regression. Want to know if retirees, as a cohort, spend the most money in casinos during weeknights? Run a regression. It is a fabulous way to understand conditions based on past data. By plotting out information on x-y axes, fitting a line to the direction and finding the correlation coefficient that measures the distance between the line and the data points, you can learn the positive/negative relationship, the trend (in time series), the expected level of influence, and so much more. There’s just a few problems to be aware of …

Consider the Thanksgiving Turkey and it’s regression-based prediction model. Over the course of several months, a flock of domestic turkeys have observed a delightful and regular increase in food allotments from their caretakers. It appears, by all indications, that this trend will simply continue as the turkeys eat more, enjoy themselves, and experience an ever-improving quality of life. From now until, well, forever, they will enjoy the most wonderful state of heavenly well-fed bliss. That’s what the data shows.

Such confidence! Such clarity! Until, of course, Thanksgiving arrives.

Source: A North Investments

Nassim Taleb introduced this illustration in his wonderful work “The Black Swan”. This will be another book review someday when we really start to tackle probabilities. The point here is simply that finding relationships in data does not make for a great prediction. Yes, there is a strong correlation between days passed and increased well-being for the turkeys. But with no awareness as to why this is occurring (they’re being fattened up to become future Butterballs), the final fateful day is a wild surprise to everyone.

It is reminiscent of the shock and surprise captured in the movie The Big Short. This is probably the best movie ever made for depicting how quickly we humans can pattern-match events to fit our own preferred delusions. Imagine being the traders who knew that the housing market was going to crash terribly. For years, they screamed and no one listened. It would be no different than the lone turkey in the yard who screams about the coming doom of November 22nd as their fat, happy compatriots just laugh it off. In both instances, everyone sees just one portion of the data–the good stuff–and can’t see the signal being provided elsewhere, a signal that shows increases in rotten mortgage-backed securities create a massive, nigh-inevitable sweep of defaults across the nation.

This is further evidence that statistics is a great tool to find data. All the same, the questions that compel the analysis are far more important than the tools used to find the information. Many mistakes abound when doing this work and the more we can understand those mistakes, the more we can avoid them. An entire chapter is dedicated solely to mistakes made in conducting regression analysis.  

Conclusion

All of this information makes for one very hard book review. I am simply scratching the surface, simplifying and condensing the work that Wheelan has done to also simplify and condense. I have packaged the packaging. All the same, I think there is one very important aspect of this review that I hope to convey:

I believe the modern era demands an understanding of statistics in order to be successful in most fields. But humility must come with it. At minimum, we will do well to think like a statistician. At worse, we’ll get arrogant and start to think that we are statisticians.

Don’t do that. Carry a few core concepts. Ingrain the notion of having no agenda, no ego, and be open to what data tells you—especially when it doesn’t confirm your suspicion. But in all things, remember the context. The context of a situation, the analysis, the intent and motive behind the work … awareness of all this is necessary to guard against the foolish certainty we sometimes assign to numbers. Judgement matters far more than we realize sometimes. I occasionally wish this wasn’t so.

I highly recommend this book as the start of refreshing our knowledge of statistics. You can buy it on Amazon

Please note, too, that the only issue that prevents this wonderful work from being deemed a 10/10 is that it doesn’t give enough direction on where to go next. It lays a marvelous foundation but, without guidance on where to go to build further, I deduct a single point. For those that want to learn more, I strongly recommend the very difficult but very enlightening course from EdX called The Analytics Edge.

Mental Models

  • Bell Curve/Normal Distribution
  • Central Limit Theorem
  • Regression to the Mean
  • Correlation not Causation (avoid narratives and consistency bias)
  • Statistics cannot be used to prove a hypothesis as simply true or false, emphatically, but rather true or false within a certain level of probability.
  • Expected Value
  • Value at Risk
  • Measure of dependency of variables
  • Significance Levels
  • Precision vs Accuracy – no amount of precision can make up for inaccuracy.
  • The questions that compel the analysis are far more important than the tools used to find the information.
  • The Null Hypothesis
  • Type 1 and Type 2 Error relative to Upside/Downside Risk
  • The Scottish Verdict
  • Probability doesn’t make mistakes; people using probability make mistakes.