ML bugs are so much trickier than bugs in traditional software because rather than getting an error, you get degraded performance (and it's not obvious a priori what ideal performance is). So ML debugging works by continual sanity checking, e.g. comparing to various baselines.

May 14, 2022 · 4:23 PM UTC

46
229
39
1,903
Replying to @gdb
That ML bug description sounds like real life human learning.
Replying to @gdb
Wandb helps a lot.
1
Replying to @gdb
The only problem with Automated Interpolation is that you never write down all of the assumptions implicit in your training data and choice of categories so you have no guard rails to tell you when the AI is attempting to extrapolate outside those assumptions.
4
Replying to @gdb
Not just that, there is the issue of "silently failing" as well. You never really know what representation your model is exactly learning. Beautiful explanation by @karpathy here: karpathy.github.io/2019/04/2…
5
Replying to @gdb
So true. I find bugs faster via performance characterization and tracking than by traditional TDD/deliberate code reviews/scanning check-ins.
Replying to @gdb
Something several jira pushing managers need to hear!!
Replying to @gdb
ML is not a snowflake. You can have degraded perf in CPU usage, amount data processed, data fetches, cache usage... For UX, you can decrease usability, retention, ... In PL you worsen error messages, worsen type inference, increase binary sizes...
Replying to @gdb
One can’t emphasize enough on the importance of looking at data (results and labels). At least for computer vision. Usually provides a lot of insights or least generates testable hypotheses for further debugging.
Replying to @gdb
The dangers of black-box algos... Still worth using? You bet! Hard as f* to diagnose? You bet! I work mostly with neuro/ES, where even repeatability is not very likely... so I feel your pain!