On Dealing with Inflation Effects on ML models
Last updated:
WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.
High-inflation scenarios add an additional challenge to training ML models because it accelerates model drift and it can easily break nonlinear relationships.
Luckily there are several ways to mitigate this problem.
Automatic retraining
If you can, set up an automatic retraining regimen for your model. This is the correct way to handle this. And it's a great thing overall because it stops model drift and it protects against adversarial attacks.
There is a trade-off to be had in deciding how to use old data: The two main strategies are sliding window (drop old data as you add new data) and append-only (just add more data while keeping old data).
In a high-inflation scenario it makes more sense to use the sliding window approach so as not to confuse the model, if you can. Unless you have robust features as I explained below.
Features that remain robust in the presence of inflation
TLDR: Avoid raw monetary values. Use ratios or normalize by the past averages.
With inflation, all features based upon raw monetary values (e.g. sum_values_last_30d) will suffer from inflation effects because the overall prices rise and monetary values partially lose their semantics.
Two ways to mitigate this problem:
Use ratios to capture trends instead: Instead of using simple statistics over a time horizon, use ratios between them. For example, use the ratio between the total money spent in the last 5 days and the last 30 days. This isn't affected by inflation so much, as it will be canceled out in the division.
- Of course, the signal isn't exactly the same, but this one is more "stable" over time and will drift more slowly.
Normalize all monetary values by the past average: Instead of using simple raw values, normalize them by the averages of the short-term past (e.g. past days or weeks). This way, the models will be better able to capture that US$30 means different things depend on whether it's 1980 or 2026. The averages of other rows in the dataset will correct it accordingly.
Some other ideas:
Use the trailing yearly inflation rate as a feature
Use the accumulated inflation rate since the start of the training data such that it's
1for the first data point.- This rate can also be used to normalize the values