When is classification accuracy for imbalanced class distributions wrong?

This means that intuitions for classification accuracy developed on balanced class distributions will be applied and will be wrong, misleading the practitioner into thinking that a model has good or even excellent performance when it, in fact, does not. Consider the case of an imbalanced dataset with a 1:100 class imbalance.

Why does the accuracy of a classification model fail?

Accuracy and error rate are the de facto standard metrics for summarizing the performance of classification models. Classification accuracy fails on classification problems with a skewed class distribution because of the intuitions developed by practitioners on datasets with an equal class distribution.

How to deal with imbalanced classification, without re-balancing the data?

If you want to get similar (not identical) results to those of rebalancing, without actually rebalancing or reweighting the data, you could try simply setting the threshold equal to the average or median value of the model’s predicted probability of class 1.

How to classify unbalanced datasets in data science?

Normalization or standardization of data (usually it belongs to this part but I will do it later, once we start building a model that requires normalized features) “BPMeds” has some missing values but ~96% of the columns is zero (no blood pressure medications taken).

What do you need to know about classification accuracy?

Classification accuracy is a metric that summarizes the performance of a classification model as the number of correct predictions divided by the total number of predictions. It is easy to calculate and intuitive to understand, making it the most common metric used for evaluating classifier models.

Why is classification accuracy unreliable in machine learning?

When the skew in the class distributions are severe, accuracy can become an unreliable measure of model performance. The reason for this unreliability is centered around the average machine learning practitioner and the intuitions for classification accuracy.

When do you need to use balanced accuracy?

Balanced accuracy is a metric that one can use when evaluating how good a binary classifier is. It is especially useful when the classes are imbalanced, i.e. one of the two classes appears a lot more often than the other. This happens often in many settings such as anomaly detection and the presence of a disease.

When is classification accuracy for imbalanced class distributions wrong?