25 Jun 2023
Neural net training fails silently
When you break your code you will often get some kind of exception. In neural nets everything could be correct syntactically, but the whole thing isn’t arranged properly and it’s very hard to tell.
Eg. You forgot to flip your labels when you flipped the images during data augmentation. Your neural network will still work pretty well cause it internally learned to detect flipped images and then flips its predictions.
The recipe
1 Get to know the data
Do not touch any neural network code. Instead spend a couple of hours scanning and analyzing the data. You can find things like corrupted images, duplicates etc.
Look for imbalances and biases. Pay attention to your own process for classifying the data, which hints at the kinds of architectures you’ll eventually wanna explore.
Search/filter/sort by whatever you can think of, visualize distributions and outliers along any axis. Outliers almost always uncover some bugs in data quality.
2 End-to-end training skeleton + get dumb baselines
Full training + evaluation skeleton to gain trust in its correctness via experiments. ick some simple model e.g. a linear classifier.
- Fix random seed
- Simplify, disable any unnecessary fanciness, turn off data augmentation
- Verify that loss starts at the correct loss value
- Input-independent baseline: e.g. set all your inputs to zero. This should perform worse than when you plug in your actual data. Does it?
- Visualize the data that’s going into the network (just before the net)
- Generalize a special case. People often bite off more than they can chew writing a relatively general functionality from scratch. Write a specific function, get that to work and then generalize it later making sure that you get the same result.
3 Overfit
First get a model large enough that it can overfit and regularize it. If we are not able to reach a low error rate with any model at all that may indicate some issues, bugs or misconfiguration.
- Picking the model: Don’t be a hero. Don’t be too crazy or creative with various exotic architectures. In the early stages of your project simply find the most related paper and copt-paste their simples architecture that achieves good performance. E.g. for classifying images simply copy a ResNet-50 for your first run.
- Adam
- Complexify only one at a time. If you have mulitple signals to plug into your classifier, plug them in one by one.
4 Regularize
- Get more data. The best and preferred way to regularize a model is to add more real training data. It’s a very common mistake to spend a lot of time
- Data augment. The next best thing to real data is half-fake data.
- Creative augmentation (fake data)
- Use a pretrained network if you can.
- Smaller input dimensionality. Remove features that may contain spurious signal.
- Add dropout.
5 Tune
- Random over grid search. Best to use random search because neural nets are often much more sensitive to some parameters than others. If param a matters but b has no effect then you want to sample a more thoroughly.
- Hyperparameter optimization
6 Squeeze out the juice
- Ensemble models
- Leave it training, for weeks even.
Notes from karpathy.github.io