When analysing a series of data, we sometimes come across values that do not appear to be part of the normal distribution of the data. These points are known as outliers and, as usual, you shouldn't always rely on your intuition to be able to detect whether a value is an outlier or not. There are tests that can be used to highlight them.

From a statistical point of view, an outlier is a value that does not belong to the normal distribution of the data. It can come from :

- A measurement or copying error (forgetting the decimal point)
- A special cause, such as a piece not being washed before measuring.

All statistical calculations using the properties of the normal distribution (statistical tests, capability calculations, out-of-tolerance % calculations) are very sensitive to the presence of outliers, so it is important to understand their origin and eliminate them before using these calculations. Non-parametric statistical tests, which are much less sensitive to outliers, may also be used.

Two main tests are used:

- Dixon test: very interesting when the number of data is low (<30)
- Grubb test: can be used in all cases.

## Test of Dixon

- b = The overall scope of the measurements (here 14.1)
- a = The distance between the part suspected of being an outlier and its nearest neighbour (here 8.6)

The ratio is calculated in %.

This report is then compared with Dixon's table:

Â | Â | Â | Â | Â | Â | Â |
---|---|---|---|---|---|---|

Number of parts | 3 | 5 | 10 | 16 | 20 | 30 |

Maximum ratio | 0.94 | 0.72 | 0.46 | 0.38 | 0.34 | 0.30 |

If the value is less than the maximum ratio suggested by the table, then the value is not an outlier. Here the ratio of 62% for 5 pieces is less than 72%. The point is therefore not an outlier.

### Grubb test

To use the Grubb test, we first calculate :

- X: The average of all measurements
- S: Standard deviation of all measurements
- G: Distance between the value suspected of being an outlier and the mean G.

G=refrac{(Value - X)}{S}

G_{limite}=\frac{N-1}{\sqrt{N}}.\sqrt{\frac{t^2_{\frac{a}{N},N-2}{}}{N-2 +t^2{}_{\frac{a}{n}}{,} N-2}}

If G>G limit, the value is considered an outlier and vice versa.

## Here are the modules you can use to calculate these indicators:

### Data Analysis

Statistics in 1 click. Use the power of statistics to find out what's behind your production data from the SPC, APC or IQC modules. Thanks to its machine learning algorithms, the Data Analysis module can be used to understand the origin of machine drift or to differentiate suppliers statistically.