Practical Tips for more Robust Real-time ML models
Last updated:WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.
Feature vectors used during real-time ML scoring may break for some reasons:
- Normal model degradation (as time goes by)
- Operational problems (upstream feature sources break, services time-out, etc)
- Adversarial attacks
There are many ways to make an ML model more robust in such a scenario. Some of them are trade-offs that incur some performance loss, some of them are full upside.
Let's see:
Tune feature_fraction
, dropout
and similar parameters
Decreasing feature_fraction
in gradient-boosted tree algorithms make the model spread out the impact from very important features, which helps soften the blow if those features get attacked
Similarly, increasing the dropout
rate in neural networks forces the model not to rely on specific features too much.
References
-
- Another piece I wrote on a related topic, with some overlap with this article's content.
Wang et al., 2018: Defensive Dropout for Hardening Deep Neural Networks under Adversarial Attacks
- Interesting article; explores how to tune the dropout in an NN so as to make it more robust vs adversarial attacks, albeit focused on CV problems.