# Normality tests

To test normality, there are several tests that can be used to validate or invalidate the hypothesis that the distribution of coins follows a normal distribution. The tests most commonly used are :
• The Chi2 test
• The Anderson-Darling test

## Normality test using the Chi2 test

To check the normality of a distribution, our first intuition would be to plot the histogram of the distribution of the observed variables. Then we would compare whether this histogram more or less resembles the Gauss curve usual.

This is exactly the principle behind the Chi2 test. It adds to this intuition a small dose of statistical calculation. The principle is as follows:

$d_i=\frac{(N_i-NP_i)^{2}}{NP_i}$

For each bar in the histogram, we can calculate :
• Ni : The number of parts actually observed (in this case 10)
• Npi: The number of parts theoretically observed if the distribution were normal (here 9.2)
•  represents the "number of misplaced parts".

We then calculate

$D=\sum_{}^{}D_i$

and it turns out that D follows a distribution law with n-2 degrees of freedom (N being the number of classes). We can then calculate the probability of obtaining such a value.

For example, for a histogram with 7 classes, if we have calculated a D of 11.07, we calculate that there are 5% to obtain such a value or more if the distribution of the parts is indeed normal.
The result of the test will therefore be 5% and the general conclusion is as follows:
• If X < 5%: the distribution of the variables is not considered to follow a normal distribution.
• If X >= 5%: the assumption of normality is accepted, and the distribution can be considered to follow a normal distribution.

## What to do in the event of non-normality

The central limit theorem tells us :
Any system, resulting from the sum of many factors independent of each other and of an equivalent order of magnitude, generates a distribution law that tends towards a normal distribution.
But we can also reason in the opposite way. If we observe a distribution that is not normal, then one of the hypotheses of the theorem is not valid:
• Case 1: the system is not the sum of many factors: it may be the product of many factors or other. In this case, the distribution law may be different and, in general, a transformation (taking the log of the result, for example) will restore a normal distribution.
• Case 2: The factors are not independent of each other
• Case 3: The factors are not of the same order of magnitude :
• One factor outweighs the others. In this case, we need to find the factor in question, because it alone generates a major source of variability.
• An outlier is polluting the distribution. In this case, we need to find the cause of the outlier and eliminate it if the cause can be explained.
In these two cases, it is not necessary to find a distribution law corresponding to the variability observed. In fact, this distribution law will not be repeatable over time because it is due to a single parameter, so it will have no predictive properties.
If the origin of the non-normality is due to case 1, you will need to find the corresponding distribution law, particularly if you want to predict the percentage of values outside tolerance. To do this, you can use the suggested distribution laws at the bottom of the window in the Data Analysis module to see if one of the distributions gives a good account of the data observed:

Retour en haut