When analysing a series of data, we sometimes come across values that do not appear to be part of the normal distribution of the data. These points are known as outliers and, as usual, you shouldn't always rely on your intuition to be able to detect whether a value is an outlier or not. There are tests that can highlight them and statistical software such as Ellistat to help you with the calculations.
From a statistical point of view, an outlier is a value that does not belong to the normal distribution of the data. It can come from :
- A measurement or copying error (forgetting the decimal point)
- A special cause, such as a piece not being washed before measuring.
All statistical calculations using the properties of the normal distribution (statistical tests, capability calculations, out-of-tolerance % calculations) are very sensitive to the presence of outliers, so it is important to understand their origin and eliminate them before using these calculations. Non-parametric statistical tests, which are much less sensitive to outliers, may also be used.
Two main tests are used:
- Test of Dixon : very interesting when the number of data is low (<30)
- Grubbs test can be used in all cases.
Test of Dixon
- b = The overall scope of the measurements (here 14.1)
- a = The distance between the part suspected of being an outlier and its nearest neighbour (here 8.6)
The ratio is calculated in %.
This report is then compared with Dixon's table:
Number of parts | 3 | 5 | 10 | 16 | 20 | 30 |
Maximum ratio | 0.94 | 0.72 | 0.46 | 0.38 | 0.34 | 0.30 |
If the value is less than the maximum ratio suggested by the table, then the value is not an outlier. Here the ratio of 62% for 5 pieces is less than 72%. The point is therefore not an outlier.
Grubb test
To use the Grubb test, we first calculate :
- X: The average of all measurements
- S: Standard deviation of all measurements
- G: Distance between the value suspected of being an outlier and the mean G.
G=refrac{(Value - X)}{S}
G_{limite}=\frac{N-1}{\sqrt{N}}.\sqrt{\frac{t^2_{\frac{a}{N},N-2}{}}{N-2 +t^2{}_{\frac{a}{n}}{,} N-2}}
If G>G limit, the value is considered an outlier and vice versa.