Regime classifier validity
Every night we run a statistical test on every strategy: do its P&L distributions actually differ across market regimes? If our regime labels are noise, this page shows it. If they're real, it shows that too.
Latest run: 2026-05-30 (UTC) • 39 nightly runs on file• Method: Welch's t-test (p<0.05 significance threshold).
of regime-pair comparisons show statistically different P&L
more discrimination than the 5% chance threshold
pair tests across 40 strategies
If the classifier produced random labels, only ~5% of pair-tests would pass p<0.05 by chance. Observing 13.5% means the labels carry real information about conditional strategy performance. This number going below ~10% would be the signal that the classifier needs to be rebuilt.
Significance % over time
Nightly % of strategy-regime pair-tests that came back statistically significant (p<0.05). The 5% line is chance. If this trends down toward the red line, the classifier is losing discriminating power and needs rebuilding.
Which timeframe's regime separates strategies best?
Same test, broken down by which higher-timeframe regime the trade was tagged with. The one with the highest significant % is the one that matters most for strategy selection right now.
Window: 39 nightly runs so far.
Strategies with the strongest regime-conditional edge
Biggest measurable performance gaps between regimes. These are the strategies where deploying in one regime vs another changes the outcome materially.
| Strategy | Regime A | Mean A | nA | vs. Regime B | Mean B | nB | Gap | p |
|---|---|---|---|---|---|---|---|---|
| ema-13-80-v2 | trending_high_vol | -1.709% | 3 | weak_trend_high_vol | +2.331% | 7 | -4.040% | 0.038 |
| ema-13-80-v1 | trending_high_vol | -2.147% | 5 | weak_trend_high_vol | +1.398% | 10 | -3.545% | 0.004 |
| ema-13-80-v2 | weak_trend_low_vol | -0.543% | 19 | weak_trend_high_vol | +2.331% | 7 | -2.874% | 0.008 |
| ema-13-80-v2 | weak_trend_med_vol | -0.350% | 14 | weak_trend_high_vol | +2.331% | 7 | -2.681% | 0.010 |
| ema-13-80-v1 | trending_high_vol | -2.147% | 5 | ranging_med_vol | +0.444% | 21 | -2.591% | 0.013 |
| ema-13-80-v2 | ranging_low_vol | -0.192% | 19 | weak_trend_high_vol | +2.331% | 7 | -2.523% | 0.014 |
| ema-13-80-v1 | trending_high_vol | -2.147% | 5 | strong_trend_low_vol | +0.372% | 8 | -2.519% | 0.048 |
| ema-13-80-v3 | trending_low_vol | -0.828% | 5 | weak_trend_high_vol | +1.569% | 9 | -2.397% | 0.021 |
| ema-slow | trending_low_vol | -1.284% | 3 | ranging_med_vol | +0.962% | 6 | -2.246% | 0.001 |
| ema-13-80-v1 | trending_low_vol | -0.828% | 5 | weak_trend_high_vol | +1.398% | 10 | -2.226% | 0.022 |
| D_mfi_14_30_williams_r_-85_tp1.25_b120 | weak_trend_low_vol | -1.662% | 3 | weak_trend_med_vol | +0.494% | 3 | -2.156% | 0.028 |
| ema-13-80-v3 | ranging_low_vol | -0.486% | 56 | weak_trend_high_vol | +1.569% | 9 | -2.055% | 0.027 |
Gap = Mean A − Mean B. Positive means regime A is better for that strategy. These figures include the v1 baseline portfolio because v2 research genes are still small sample. Will shift to v2 as forward-test trade counts grow.
Which regime boundaries matter most?
Pairs of regime labels that most often produced statistically different strategy P&L. Pairs near the top are the real regime boundaries. Pairs that rarely appear here are candidates for merging. They aren't doing distinct work.
How this works
- Our classifier tags every trade with the market regime at entry (and exit) on 1h, 4h, and 1d timeframes. Labels include trending_low_vol, ranging_high_vol, etc.
- For every strategy with enough trades, we group its P&L by the regime label at entry and compute mean and variance per regime.
- For every pair of regimes with ≥3 trades each, we run Welch's t-test to check whether the two regime distributions differ. p<0.05 means the difference is unlikely to be chance.
- If > 5% of pair-tests come back significant, the classifier is producing labels that carry real information. Below 5% means it's random noise and needs rebuilding.
The validator runs at 03:00 UTC nightly from src/worker/adaptive-v2/regime-validator.js (private repo). The full method is documented above.
Not financial advice. Regime labels are descriptive, not predictive. A strategy that performs well in one regime historically may not continue to in the future. Past performance does not guarantee future results.