← Research lab

Regime classifier validity

Every night we run a statistical test on every strategy: do its P&L distributions actually differ across market regimes? If our regime labels are noise, this page shows it. If they're real, it shows that too.

Latest run: 2026-05-30 (UTC) • 39 nightly runs on file• Method: Welch's t-test (p<0.05 significance threshold).

Is the classifier load-bearing?
13.5%

of regime-pair comparisons show statistically different P&L

2.7×

more discrimination than the 5% chance threshold

2,623

pair tests across 40 strategies

If the classifier produced random labels, only ~5% of pair-tests would pass p<0.05 by chance. Observing 13.5% means the labels carry real information about conditional strategy performance. This number going below ~10% would be the signal that the classifier needs to be rebuilt.

Significance % over time

Nightly % of strategy-regime pair-tests that came back statistically significant (p<0.05). The 5% line is chance. If this trends down toward the red line, the classifier is losing discriminating power and needs rebuilding.

0%10%20%30%40%50%chance (5%)2026-04-222026-05-30
1-hour 4-hour Daily

Which timeframe's regime separates strategies best?

Same test, broken down by which higher-timeframe regime the trade was tagged with. The one with the highest significant % is the one that matters most for strategy selection right now.

Window: 39 nightly runs so far.

Daily regime
8.2%
38 of 462 significant
1-hour regime
15.9%
183 of 1149 significant
4-hour regime
13.1%
133 of 1012 significant

Strategies with the strongest regime-conditional edge

Biggest measurable performance gaps between regimes. These are the strategies where deploying in one regime vs another changes the outcome materially.

StrategyRegime AMean AnAvs. Regime BMean BnBGapp
ema-13-80-v2trending_high_vol-1.709%3weak_trend_high_vol+2.331%7-4.040%0.038
ema-13-80-v1trending_high_vol-2.147%5weak_trend_high_vol+1.398%10-3.545%0.004
ema-13-80-v2weak_trend_low_vol-0.543%19weak_trend_high_vol+2.331%7-2.874%0.008
ema-13-80-v2weak_trend_med_vol-0.350%14weak_trend_high_vol+2.331%7-2.681%0.010
ema-13-80-v1trending_high_vol-2.147%5ranging_med_vol+0.444%21-2.591%0.013
ema-13-80-v2ranging_low_vol-0.192%19weak_trend_high_vol+2.331%7-2.523%0.014
ema-13-80-v1trending_high_vol-2.147%5strong_trend_low_vol+0.372%8-2.519%0.048
ema-13-80-v3trending_low_vol-0.828%5weak_trend_high_vol+1.569%9-2.397%0.021
ema-slowtrending_low_vol-1.284%3ranging_med_vol+0.962%6-2.246%0.001
ema-13-80-v1trending_low_vol-0.828%5weak_trend_high_vol+1.398%10-2.226%0.022
D_mfi_14_30_williams_r_-85_tp1.25_b120weak_trend_low_vol-1.662%3weak_trend_med_vol+0.494%3-2.156%0.028
ema-13-80-v3ranging_low_vol-0.486%56weak_trend_high_vol+1.569%9-2.055%0.027

Gap = Mean A − Mean B. Positive means regime A is better for that strategy. These figures include the v1 baseline portfolio because v2 research genes are still small sample. Will shift to v2 as forward-test trade counts grow.

Which regime boundaries matter most?

Pairs of regime labels that most often produced statistically different strategy P&L. Pairs near the top are the real regime boundaries. Pairs that rarely appear here are candidates for merging. They aren't doing distinct work.

trending_high_volvsweak_trend_high_vol
20 strategies differavg gap +0.744%
trending_high_volvsranging_med_vol
18 strategies differavg gap +0.541%
ranging_low_volvsweak_trend_high_vol
16 strategies differavg gap +0.586%
trending_high_volvsweak_trend_low_vol
16 strategies differavg gap +0.368%
trending_high_volvsweak_trend_med_vol
14 strategies differavg gap +0.343%
ranging_low_volvsweak_trend_low_vol
13 strategies differavg gap +0.213%
trending_high_volvsstrong_trend_med_vol
12 strategies differavg gap +0.389%
ranging_low_volvsranging_med_vol
12 strategies differavg gap +0.334%
trending_high_volvsstrong_trend_low_vol
11 strategies differavg gap +0.610%
ranging_low_volvsweak_trend_med_vol
11 strategies differavg gap +0.126%

How this works

  1. Our classifier tags every trade with the market regime at entry (and exit) on 1h, 4h, and 1d timeframes. Labels include trending_low_vol, ranging_high_vol, etc.
  2. For every strategy with enough trades, we group its P&L by the regime label at entry and compute mean and variance per regime.
  3. For every pair of regimes with ≥3 trades each, we run Welch's t-test to check whether the two regime distributions differ. p<0.05 means the difference is unlikely to be chance.
  4. If > 5% of pair-tests come back significant, the classifier is producing labels that carry real information. Below 5% means it's random noise and needs rebuilding.

The validator runs at 03:00 UTC nightly from src/worker/adaptive-v2/regime-validator.js (private repo). The full method is documented above.

Not financial advice. Regime labels are descriptive, not predictive. A strategy that performs well in one regime historically may not continue to in the future. Past performance does not guarantee future results.