31 Oct 2023
Regularization is the training of the model so that it can generalize over data it hasn’t seen before. We can regularize models using data augmentation, weight decay or dropout.
In dropout, during training, a random subset of neurons or units in a neural network is "dropped out" or temporarily turned off with a probability, usually specified as a hyperparameter, typically in the range of 0.2 to 0.5. This means that for each training example, some portion of the network's units are not used in the forward and backward passes. As a result, the network becomes more robust and generalizes better to new data.
The key idea behind dropout is that it prevents the network from relying too heavily on any specific neuron, making the network more adaptive and less likely to overfit. During inference (i.e., when making predictions), all neurons are active, and their contributions are scaled by the dropout probability used during training.
It drops a unit (eg. node) and its connections with a specified probability value p (common value is p = 0.5).
Why? To prevent co-adaptation where neural network becomes too reliant on particular connections (this could mean overfitting is happening).