How do I deal with outliers with Ellistat Data Analysis?

Outlier testing

When analysing a series of data, we sometimes come across values that do not appear to be part of the normal distribution of the data. These points are known as outliers and, as usual, you shouldn't always rely on your intuition to be able to detect whether a value is an outlier or not. There are tests that can highlight them and statistical software such as Ellistat to help you with the calculations.

From a statistical point of view, an outlier is a value that does not belong to the normal distribution of the data. It can come from :

  • A measurement or copying error (forgetting the decimal point)
  • A special cause, such as a piece not being washed before measuring.

All statistical calculations using the properties of the normal distribution (statistical tests, capability calculations, out-of-tolerance % calculations) are very sensitive to the presence of outliers, so it is important to understand their origin and eliminate them before using these calculations. Non-parametric statistical tests, which are much less sensitive to outliers, may also be used.

Two main tests are used:

Test of Dixon

Testing for outliers using the Dixon test
To use the Dixon test, calculate the ratio :
  • b = The overall scope of the measurements (here 14.1)
  • a = The distance between the part suspected of being an outlier and its nearest neighbour (here 8.6)

The ratio is calculated in %.

This report is then compared with Dixon's table:

       
Number of parts3510162030
Maximum ratio0.940.720.460.380.340.30

If the value is less than the maximum ratio suggested by the table, then the value is not an outlier. Here the ratio of 62% for 5 pieces is less than 72%. The point is therefore not an outlier.

Grubb test

To use the Grubb test, we first calculate :

  • X: The average of all measurements
  • S: Standard deviation of all measurements
  • G: Distance between the value suspected of being an outlier and the mean G.

G=refrac{(Value - X)}{S}

Tests for outliers Grubb test
The value of G obtained is then compared with a limit G :

G_{limite}=\frac{N-1}{\sqrt{N}}.\sqrt{\frac{t^2_{\frac{a}{N},N-2}{}}{N-2 +t^2{}_{\frac{a}{n}}{,} N-2}}

If G>G limit, the value is considered an outlier and vice versa.

Here are the modules you can use to carry out statistical tests:

Retour en haut