What is masking and swamping?
It is more difficult and time taking in distinguishing the anomalous data points. This causes Swamping (labelling normal instances as anomalies) and Masking ( existence of too many anomalies) . Using large number of records for identifying anomalies is the primary reason behind Swamping and Masking.
What is masking effect in statistics?
Abstract. The masking effect in cases of tests for outlier(s) is defined and quantified by the loss in power due to the presence of more than the anticipated number of discordant observations in the sample.
What happens when there are too many outliers in a test?
On the other hand, swamping can occur when we specify too many outliers in the test. For example, if we are testing for two or more outliers when there is in fact only a single outlier, both points may be declared outliers (many tests will declare either all or none of the tested points as outliers).
When to use non-normality assumption to detect outliers?
If the normality assumption for the data being tested is not valid, then a determination that there is an outlier may in fact be due to the non-normality of the data rather than the prescence of an outlier.
Which is the best book for detection of outliers?
Iglewicz and Hoaglinprovide an extensive discussion of the outlier tests given above (as well as some not given above) and also give a good tutorial on the subject of outliers. Barnett and Lewisprovide a book length treatment of the subject.
When to use an upper bound on the number of outliers?
It has the limitation that the number of outliers must be specified exactly. Generalized Extreme Studentized Deviate (ESD) Test- this test requires only an upper bound on the suspected number of outliers and is the recommended test when the exact number of outliers is not known.