Can data augmentation lead to overfitting?

Can data augmentation lead to overfitting?

While data augmentation prevents the model from overfitting, some augmentation combinations can actually lead to underfitting. In this blog post we take the example of semantic segmentation on satellite images, to see the impact of different combinations of data augmentations on training.

What does data augmentation mean?

Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks.

What is overfitting in deep learning?

Overfitting refers to a model that models the training data too well. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. The problem is that these concepts do not apply to new data and negatively impact the models ability to generalize.

How is data augmentation used in machine learning?

Data augmentation is a de facto technique used in nearly every state-of-the-art machine learning model in applications such as image and text classification. Heuristic data augmentation schemes are often tuned manually by human experts with extensive domain knowledge, and may result in suboptimal augmentation policies.

How does data augmentation work for unstructured data?

For unstructured data such as images and text, the augmentation techniques vary from simple transformations to neural network generated data, based on the complexity of the application. The augmentation techniques for images and text type data are discussed separately in the following sections.

How are data augmentation techniques used in computer vision?

The above Augmentation techniques help in generalizing the model by preventing the overfitting and in turn increases the accuracy of the model. These techniques can be applicable only for the Computer Vision problems with image datasets. There are also techniques to generate synthetic data for other types of datasets also.

How is data augmentation related to rocket engines?

The relation between deep learning models and amount of training data required is analogous to that of the relation between rocket engines (deep learning models) and the huge amount of fuel (huge amounts of data) required for the rocket to complete its mission (success of the deep learning model).