What does it mean to be robust to outliers?

What does it mean to be robust to outliers?

Robust statistics are resistant to outliers. For example, the mean is very susceptible to outliers (it’s non-robust), while the median is not affected by outliers (it’s robust).

What is meant by robustness in statistics?

In statistics, the term robust or robustness refers to the strength of a statistical model, tests, and procedures according to the specific conditions of the statistical analysis a study hopes to achieve. In other words, a robust statistic is resistant to errors in the results.

Which of the following is robust against outliers?

The median absolute deviation is one generally accepted measure of the spread of data points, robust in the sense that it is insensitive to the exact values of outliers unless outliers represent over half of the observations.

Which models are robust to outliers?

You can use a model that’s resistant to outliers. Tree-based models are generally not affected by outliers, while regression-based models are. If you are performing a statistical test, try a non-parametric test instead of a parametric one.

What does it mean when a statistic like the median is said to be robust resistant to outliers?

A fundamental difference between mean and median is that the mean is much more sensitive to extreme values than the median. That is, one or two extreme values can change the mean a lot but do not change the the median very much. Thus, the median is more robust (less sensitive to outliers in the data) than the mean.

How to detect outliers in regularly sampled data?

The other question deals with regularly sampled data. The “outliers” in financial data exhibit some specific patterns that could be detected with specific techniques not applicable in other domains and I’m -in part- looking for those specific techniques.

How many outliers are there in a time series?

In more extreme cases (e.g. the flash crash) the outliers might amount to more than 75% of the data over longer intervals (> 10 minutes). In addition, the (high) frequency of incoming data contains some information about the outlier aspect of the situation. The problem is definitely hard.

How to use robust estimator in time series detection?

This is done by using a robust estimator of the conditional variance (instead of the robust estimator of the unconditional variance I was suggesting earlier): the M-estimation of the GARCH model. Then you will have a robust, time varying estimate of ( μ ^ t, σ ^ t) which are not the same as those produced by the usual GARCH fit.