Ironically enough, it is neural nets themselves that provide us an *echo-chamber of information*; to which we end up becoming overfit
Aka - YouTube, Facebook, Twitter optimize the over-fitting of humans
there needs to be an essay like “worse is better” but for training models. there should always be a getting started memo for every domain that’s important.
I think we've learnt in the past months that most LLMs are actually suffering from a specific sort of underfitting.
Datasets too small & models too large.
Train smaller models for many more epochs on larger datasets
The point may be that you are always either over or under fitting. There is no perfect fit, only plausibly functional and serviceable fits. “All models are wrong, but some are useful”— Quip from George Box