|
Statistics is a science, right?
Or perhaps not...
Example A:
Stop three people at random in the street and
ask, 'can you read?' |
 |
Two answer yes, the third no. Divide three into
one and multiply by 100. The result is 33.3%, which tells you one third
of the population is illiterate.
Example B:
Ask one million people the same question. All
except one answer yes.
The one answers no. 1 divided by 999,999 and
multiplied by 100 gives an answer of 0.0001 percent illiteracy. It is an
insignificant quantity, so you ignore it and write in your report the
population is 100 percent literate. It does not bother the single
illiterate because he cannot read the report anyway.
Example C:
An election is called. At the start of the campaign,
candidate A is slightly ahead of candidate B and there are a lot of undecided
voters. To candidate A, the undecided are a disaster waiting to happen. So a
professional polling organization is hired to survey a random sample of eligible
voters. Presuming the pollsters confirm A's popular edge, that statistic is
widely publicized. The publicity gains A many of the undecided; not because of
his platform but because it is human nature to side with the winner. It is known
in the trade as the jump-on-the-bandwagon syndrome.
If statistics really is a science, the examples tell us
it is also one of the most misunderstood.
In its most basic sense, statistics is simply a means to
determine the unknown by studying the known. But if the known statistical sample
is not representative enough, any statistical study will produce wildly
inaccurate results.
The premise that for a study of literacy in a large city
three people are a true statistical sample, is obviously absurd. We need a much
larger sample. So if instead of only three people we question a thousand, does
that give an accurate indication of the state of literacy?
Not necessarily.
For instance, 1,000 students on a university campus
would clearly not be representative of the city's population. Even street
sampling can be suspect, especially if the survey is done on the local
equivalent of New York's Fifth Avenue, or in an industrial slum.
OK, so we split the sample to represent all economic
classes; from suburb to downtown, from Luxuria to slum.
Unfortunately we have now opened a new can of worms. If
our statistical survey is to produce an accurate literacy count for the entire
city, how do we apportion the sub-samples? Do we need to hire an additional
group of statisticians to determine the relative size of those economic units?
It is an argument that can go on ad infinitum.
But it does illustrate the difficulty of determining a true representative
sample.
The decision to decide what statistical information is
or is not significant can be a tough one. If one person in 10,000 stirs his or
her coffee anti-clockwise, it is a statistic even the least of us can do
without. But if one child in a million suffers from a disease which cannot be
treated because it happens to be statistically insignificant, then perhaps we
should wonder if numbers are being used to smother compassion and save tax
dollars.
A phenomenon of our times is the unholy mix of politics
and statistics. It is often said that an electorate gets the kind of government
it deserves, so I suppose we can blame no one except ourselves if our votes are
influenced more by glibly quoted numbers than by candidates' policies. Using the
statistics of voter support to improve the statistics of voter support,
candidate A is literally hauling himself up by his own bootstraps!
Is there a lesson to be learned? Should all statistical
information be regarded as suspect?
Again, Not necessarily.
Nature in its infinite variation is mostly hidden from
our direct view. But because natural laws apply equally at the other end of the
universe as on our own small world, the science of statistics gives us the
ability to understand what we cannot see by studying what we can. It is knowing
the desert from a cupful of sand.
Human beings are not consistent. Neither as individuals
or as the groups we call nations is there any accurate mathematical model which
can be used to predict future acts or consequences. Science fiction not
withstanding (in his Foundation stories, Isaac Asimov invented
"psychohistory" to predict and even to manipulate the future of human affairs),
statistics can only indicate trends. But in terms of planning industrial
production, knowing what food crops to put in the ground, or even trying to
decide when to call an election, the knowledge of such trends can be valuable.
In any context statistics is therefore a true science,
and an important one. But its results must be interpreted for what they are, not
what we want them to be.
Perhaps 99.99% of those who read this will either agree
or be indifferent. But if the remaining 'insignificant' 0.01% happens to be a
psychotic statistician who harbors murderous intent against anyone who dares to
hint that his beloved science has limitations...
|