I built a gradient boosting model to predict liquidity pool yields 24 hours ahead. It used 40 features including historical APY, trading volume, token volatility, and gas prices. Validation metrics looked exceptional.

The model worked beautifully for six weeks of stable market conditions. Then ETH dropped 15% in one day and every prediction was off by at least 30 percentage points.

The validation trap

I had split my data randomly into training and test sets. Both sets came from the same stable market period. The model learned to predict small variations around a stable mean, not actual yield dynamics during stress.

When volatility hit, impermanent loss patterns changed completely. Trading volumes spiked in ways the model had never encountered. The careful feature engineering that worked in calm markets became noise.

Why complexity failed

Those 40 features let the model memorize specific market states instead of learning general principles. It found correlations that existed only in that particular six-week period. Adding more features made the overfitting worse, not better.

A simpler model with five core features would have generalized better because it couldn't memorize as much irrelevant detail.

What survives volatility

Models need to train on data that includes market stress, not just normal conditions. Your validation set must represent scenarios the model will actually face in production, including the ugly ones.

I now keep a separate holdout set from high-volatility periods specifically to test whether predictions hold up when markets move. That metric matters more than validation accuracy during calm periods.