Simanaitis Says

On cars, old, new and future; science & technology; vintage airplanes, computer flight simulation of them; Sherlockiana; our English language; travel; and other stuff

CAUSALITY?

“THERE ARE lies, damned lies—and statistics!” And one of the damnedest lies—which, I hasten to add, shouldn’t be blamed on statistics—involves the confusion of “causality” with “correlation.” This shows up in sociology, economics, politics and even, lamentably enough, in science.

A book reviewed in Science, 5 April 2013, addresses this. Here, I add my own comments as well.

Naked Statistics: Stripping the Dread from the Data, by Charles Wheelan, Norton, 2013. Both www.amazon.com and www.abebooks.com list it.

Wheelan’s book discusses topics of a first-year statistics course—median, variance, regression analysis—with examples from real life, as opposed to sock drawers. The Science reviewer, Evelyn J. Lamb, commends Wheelan’s common thread of skepticism, good advice for many of us confronted with masses of data.

Lamb notes that statistics can never prove causality. Which is a good jumping-off point for discussing the difference between the latter and correlation.

As the word suggests, correlation implies a relationship between sets of data—and nothing more.

By contrast, again as the word suggests, causality implies a very specific relationship—namely, one of cause and effect, of “if one thing, then the other.”

Here’s an example that’s also part of sociologic folklore: There’s excellent correlation between the salaries of teachers in a school district and that district’s sales of alcohol.

Amusing though it might be to imagine all those sotted educators, the truth likely is otherwise: Both teacher salaries and alcohol sales are logical effects of a school district’s affluence.

Here’s a more subtle example cited in Naked Statistics: Are test scores at a high school consistently rising because the teachers and principal are unusually effective? Or are the worst students dropping out and thus skewing the sample?

On a similar theme of data sampling, the late (and politically liberal) theater critic Pauline Kael is credited with saying, likely apocryphally, “Nixon couldn’t have won. I don’t know anyone who voted for him.”

Wheelan also cites a 1936 poll in Literary Digest that predicted Republican Alf Landon would beat incumbent Democrat Franklin Delano Roosevelt. In its favor, the poll had a huge sample, 10 million people. However, they were all subscribers who owned both automobiles and telephones—not exactly typical folks in 1936.

Wheelan comments, “As polls with good samples get larger, they get better, since the margin of error shrinks. As polls with bad samples get larger, the pile of garbage just gets bigger and smellier.”

The quality of the sample is important. I recall an example within the data bank of climate research: It’s said that, compared with pre- and post-war values, average daily temperatures rose during World War II.

Global warming generated by all the hostilities? No. Digging into the data revealed that—because of wartime blackout restrictions—there were skewed numbers of daylight readings and fewer taken during inherently cooler nighttime.

There’s also “cherry picking,” of selecting non-representative elements of data that just happen to corroborate one’s argument.

Even if the data are robust, correlation is just a starting point of an investigation. Is there any cause/effect taking place? Can common sense bolster—or discard—such a cause/effect relationship?  Is there a mechanism that explains the causality? Or is more study warranted before proposing a cause/effect relationship?

The 1 March 2013 issue of Science has a news item, a mini-essay and a technical report on climate research. Briefly, changes in the concentration of atmospheric CO2 and surface air temperature are closely related; that is, they’re highly correlated. However, as cited in Science, temperature can influence atmospheric CO2 as well as be influenced by it.

The science of global warming isn’t “over.” Science never is. ds