SF Canada  
 

Article SPRING 2007

Lies, Damned Lies and Statistics
by
J. Brian Clarke

Statistics is a science, right?

Or perhaps not...

Example A:

Stop three people at random in the street and ask, 'can you read?'

Two answer yes, the third no. Divide three into one and multiply by 100. The result is 33.3%, which tells you one third of the population is illiterate.

Example B:

Ask one million people the same question. All except one answer yes.

The one answers no. 1 divided by 999,999 and multiplied by 100 gives an answer of 0.0001 percent illiteracy. It is an insignificant quantity, so you ignore it and write in your report the population is 100 percent literate. It does not bother the single illiterate because he cannot read the report anyway.

Example C:

An election is called. At the start of the campaign, candidate A is slightly ahead of candidate B and there are a lot of undecided voters. To candidate A, the undecided are a disaster waiting to happen. So a professional polling organization is hired to survey a random sample of eligible voters. Presuming the pollsters confirm A's popular edge, that statistic is widely publicized. The publicity gains A many of the undecided; not because of his platform but because it is human nature to side with the winner. It is known in the trade as the jump-on-the-bandwagon syndrome.

If statistics really is a science, the examples tell us it is also one of the most misunderstood.

In its most basic sense, statistics is simply a means to determine the unknown by studying the known. But if the known statistical sample is not representative enough, any statistical study will produce wildly inaccurate results.

The premise that for a study of literacy in a large city three people are a true statistical sample, is obviously absurd. We need a much larger sample. So if instead of only three people we question a thousand, does that give an accurate indication of the state of literacy?

Not necessarily.

For instance, 1,000 students on a university campus would clearly not be representative of the city's population. Even street sampling can be suspect, especially if the survey is done on the local equivalent of New York's Fifth Avenue, or in an industrial slum.

OK, so we split the sample to represent all economic classes; from suburb to downtown, from Luxuria to slum.

Unfortunately we have now opened a new can of worms. If our statistical survey is to produce an accurate literacy count for the entire city, how do we apportion the sub-samples? Do we need to hire an additional group of statisticians to determine the relative size of those economic units?

It is an argument that can go on ad infinitum. But it does illustrate the difficulty of determining a true representative sample.

The decision to decide what statistical information is or is not significant can be a tough one. If one person in 10,000 stirs his or her coffee anti-clockwise, it is a statistic even the least of us can do without. But if one child in a million suffers from a disease which cannot be treated because it happens to be statistically insignificant, then perhaps we should wonder if numbers are being used to smother compassion and save tax dollars.

A phenomenon of our times is the unholy mix of politics and statistics. It is often said that an electorate gets the kind of government it deserves, so I suppose we can blame no one except ourselves if our votes are influenced more by glibly quoted numbers than by candidates' policies. Using the statistics of voter support to improve the statistics of voter support, candidate A is literally hauling himself up by his own bootstraps!

Is there a lesson to be learned? Should all statistical information be regarded as suspect?

Again, Not necessarily.

Nature in its infinite variation is mostly hidden from our direct view. But because natural laws apply equally at the other end of the universe as on our own small world, the science of statistics gives us the ability to understand what we cannot see by studying what we can. It is knowing the desert from a cupful of sand.

Human beings are not consistent. Neither as individuals or as the groups we call nations is there any accurate mathematical model which can be used to predict future acts or consequences. Science fiction not withstanding (in his Foundation stories, Isaac Asimov invented "psychohistory" to predict and even to manipulate the future of human affairs), statistics can only indicate trends. But in terms of planning industrial production, knowing what food crops to put in the ground, or even trying to decide when to call an election, the knowledge of such trends can be valuable.

In any context statistics is therefore a true science, and an important one. But its results must be interpreted for what they are, not what we want them to be.

Perhaps 99.99% of those who read this will either agree or be indifferent. But if the remaining 'insignificant' 0.01% happens to be a psychotic statistician who harbors murderous intent against anyone who dares to hint that his beloved science has limitations...

 



Home

About

News

Books

Members

Join

Links

Archive

Posted May 25, 2007