- The Chi2 test
- The Anderson-Darling test

## Normality test using the Chi2 test

To check the normality of a distribution, our first intuition would be to plot the histogram of the distribution of the observed variables. Then we would compare whether this histogram more or less resembles the Gauss curve usual.

This is exactly the principle behind the Chi2 test. It adds to this intuition a small dose of statistical calculation. The principle is as follows:

d_i=\frac{(N_i-NP_i)^{2}}{NP_i}

- Ni : The number of parts actually observed (in this case 10)
- Npi: The number of parts theoretically observed if the distribution were normal (here 9.2)
- di represents the "number of misplaced parts".

We then calculate

D=\sum_{}^{}D_i

and it turns out that D follows a distribution law with n-2 degrees of freedom (N being the number of classes). We can then calculate the probability of obtaining such a value.

- If X < 5%: the distribution of the variables is not considered to follow a normal distribution.
- If X >= 5%: the assumption of normality is accepted, and the distribution can be considered to follow a normal distribution.

## What to do in the event of non-normality

- Case 1: the system is not the sum of many factors: it may be the product of many factors or other. In this case, the distribution law may be different and, in general, a transformation (taking the log of the result, for example) will restore a normal distribution.
- Case 2: The factors are not independent of each other
- Case 3: The factors are not of the same order of magnitude :
- One factor outweighs the others. In this case, we need to find the factor in question, because it alone generates a major source of variability.
- An outlier is polluting the distribution. In this case, we need to find the cause of the outlier and eliminate it if the cause can be explained.

## Here are the modules you can use to calculate these indicators:

### Data Analysis

Statistics in 1 click. Use the power of statistics to find out what's behind your production data from the SPC, APC or IQC modules. Thanks to its machine learning algorithms, the Data Analysis module can be used to understand the origin of machine drift or to differentiate suppliers statistically.