To prove a hypothesis in statistics, we often use what is called the p-value method. Without resorting to technical jargon, it identifies whether something has happened by chance or by design. If it has happened by chance, we say that the Null Hypothesis (something that happens by chance) is true (technically, we say that we fail to reject the Null, same as in courts where the verdict is either “guilty or not guilty” and not “guilty or innocent.”) If, on the other hand, the event cannot be explained on the basis of chance, we conclude that it happened by design. That marks the event as being ‘statistically significant.’
The reasoning seems plausible. The trouble arises when we try to separate events that happen by chance from events that happen by design by erecting a barrier between them. In statistics, that barrier is a probability cut-off value called the significance level, and it is the value we choose that is increasingly turning out to be the problem.
This is how Hypothesis Testing works in statistics. We compute the probability of an event happening on the assumption that the Null Hypothesis is true (that is, chance is a valid explanation). We then compare it with the significance level. If the p-value is less (well, less than or equal to) than the significance level, we say that the probability is too low for the event to have happened by chance. Thus, it had to happen by design, that is, someone or something is working behind the scene to make it happen. If, on the other hand, the p-value is greater than the significance level, then the probability that the event happened by fluke is high, so we accept the Null Hypothesis (well, technically we say we do not reject the Null Hypothesis). In other words, there is no surprise here. It’s business as usual.
So where is the problem? It lies in the value of the significance level normally chosen to prove whether an event has entered the realm of significance or not.
By default, the significance level is chosen to be 0.05. It means that if the p-value is greater than the significance level, the event could have happened by chance. It is not significant. But if the p-value is less than 0.05, then the probability of it happening by chance is so low that we say it could not have happened by chance. In other words, it is significant. An event is statistically significant if the Null Hypothesis is rejected.
But more and more, we are finding that a significance level of 0.05 is too high a value. It sets the bar too low to provide evidence against the Null Hypothesis. In this scenario, events that claim to be significant are, in fact, not significant.
Here is an example: a new drug helps 40 out of 100 patients, whereas, without the drug, only 30 out of 100 are cured. There is a difference of 10 more patients who are cured compared to the non-intervention method. But is that significant, particularly when the treatment-control experiment is replicated many more times?
Notice the contrast between the significance level and the height of the bar. If the significance level is high, the bar is set too low to claim significance. On the other hand, if the significance level is very low, the bar to provide evidence against the Null Hypothesis is that much higher.
So what should we do? A consensus is emerging that the significance level should be lowered to 0.005. That way, the bar to prove that something happens by design and not by chance will have been raised by a factor of 10 (10 times higher). That will separate the signal from the noise in a far more conclusive way than a significance level of 0.05.
So, to set the significance bar higher in statistics, lower the significance level. That will keep the fake at bay and usher in only the genuine.
(Want to raise the numerical and statistical literacy of your rank-and-file to unleash their creativity? Contact me today.)