25 Jun 2023

Recipe for Training Neural Networks

Neural net training fails silently

When you break your code you will often get some kind of exception. In neural nets everything could be correct syntactically, but the whole thing isn’t arranged properly and it’s very hard to tell.

Eg. You forgot to flip your labels when you flipped the images during data augmentation. Your neural network will still work pretty well cause it internally learned to detect flipped images and then flips its predictions.

The recipe

1 Get to know the data

Do not touch any neural network code. Instead spend a couple of hours scanning and analyzing the data. You can find things like corrupted images, duplicates etc.

Look for imbalances and biases. Pay attention to your own process for classifying the data, which hints at the kinds of architectures you’ll eventually wanna explore.

Search/filter/sort by whatever you can think of, visualize distributions and outliers along any axis. Outliers almost always uncover some bugs in data quality.

2 End-to-end training skeleton + get dumb baselines

Full training + evaluation skeleton to gain trust in its correctness via experiments. ick some simple model e.g. a linear classifier.

Fix random seed
Simplify, disable any unnecessary fanciness, turn off data augmentation
Verify that loss starts at the correct loss value
Input-independent baseline: e.g. set all your inputs to zero. This should perform worse than when you plug in your actual data. Does it?
Visualize the data that’s going into the network (just before the net)
Generalize a special case. People often bite off more than they can chew writing a relatively general functionality from scratch. Write a specific function, get that to work and then generalize it later making sure that you get the same result.

3 Overfit

First get a model large enough that it can overfit and regularize it. If we are not able to reach a low error rate with any model at all that may indicate some issues, bugs or misconfiguration.

Picking the model: Don’t be a hero. Don’t be too crazy or creative with various exotic architectures. In the early stages of your project simply find the most related paper and copt-paste their simples architecture that achieves good performance. E.g. for classifying images simply copy a ResNet-50 for your first run.
Adam
Complexify only one at a time. If you have mulitple signals to plug into your classifier, plug them in one by one.

4 Regularize

Get more data. The best and preferred way to regularize a model is to add more real training data. It’s a very common mistake to spend a lot of time
Data augment. The next best thing to real data is half-fake data.
Creative augmentation (fake data)
Use a pretrained network if you can.
Smaller input dimensionality. Remove features that may contain spurious signal.
Add dropout.

5 Tune

Random over grid search. Best to use random search because neural nets are often much more sensitive to some parameters than others. If param a matters but b has no effect then you want to sample a more thoroughly.
Hyperparameter optimization

6 Squeeze out the juice

Ensemble models
Leave it training, for weeks even.

Notes from karpathy.github.io