Module 1

Why Most Strategies Fail

Module 1 of 6

88 of 112

strategies we tested failed rigorous validation

These weren't random ideas from Reddit. Each one was a structured hypothesisA specific, testable prediction about market behaviour. Example: "Small gaps on ES fill within the first hour at least 65% of the time.", tested against the same validation bar. Statistical significance. Real costs. Out-of-sample consistency.

Most of them were promising in backtest. Some had 70%+ win rates. Several made money on paper. But when we applied the full validation pipeline, almost all of them broke. And in July 2026 we turned the same weapons on our own live book: a re-validation audit found simulator defects behind most of our previously "validated" strategies, and we retracted ten of them publicly — including, a day later, the one we initially announced as the survivor, when hand-checking its fills exposed one more simulator bug. The grid below is the honest state.

RED Failed AMBER Shadow validating GREEN Passed all 4 gates

Two approaches to strategy development

What most people do

Backtest until the equity curve looks good
Optimize parameters on the full dataset
Ignore transaction costs and slippage
No out-of-sample testing
Trade it live after one good backtest
Blame "market conditions" when it fails

What actually works

State the hypothesis before testing
Require statistical significance (not just profit)
Include realistic costs in every simulation
Validate on data the strategy never saw
Require robustness to parameter changes
Kill strategies that fail. Move on.

The 3 Gates

Every strategy must pass all three. No exceptions.

Statistical Significance

p < 0.05The probability that the observed results happened by random chance. p<0.05 means less than 5% chance it's luck., n ≥ 100

At least 100 trades. Less than 5% chance the results are random. Small samples lie.

Profitable After Costs

Expectancy > $0

Include slippageThe difference between the expected fill price and actual fill price. We assume 1 tick per side as a realistic worst-case. (1 tick/side) and commissionsBroker fees per trade. ~$2.50 round-trip for MES, ~$5.00 for ES. ($2.50-5.00 RT). Many strategies with "edge" are actually negative after costs.

Walk-Forward Consistent

WFWalk-forward validation splits data into sequential periods. The strategy must be profitable in a majority of periods it never trained on. ≥ 75%

Split the data into sequential windows. The strategy must work on data it has never seen. This catches overfittingWhen a strategy is tuned so tightly to historical data that it captures noise, not signal. Looks great in backtest, fails live..

The Funnel

Hypotheses tested

Statistically significant

~25

Profitable after costs

~14

Survived the 2026-07 re-validation audit

This is normal.

A near-90% failure rate isn't bad luck. It's the expected outcome of honest validation. The value isn't in finding strategies. It's in killing the ones that don't work before you risk real money.

Finding a winning strategy is the easy part.

The hard part is what happens when a validated strategy has 16 losses in 22 trades over a full year, and you have to take the next signal anyway. Module 6 shows you exactly what that looks like with real trades from the 2022-2023 bear market, when the showcase H12 strategy ran at 27.3% win rate for two years and still cleared $2,938 — because the math of asymmetric R:R does not care how it feels.

What is overfitting?

Overfitting is when a strategy learns the noise in historical data instead of the signal. It looks fantastic in backtest because it has been tuned to match every zig and zag of the past. Then it fails live because those specific patterns were random.

Classic signs: the strategy uses many parameters, performance degrades sharply with small parameter changes, and it works on the training data but not on held-out data.

Our defence: Gate 3 (walk-forward validation) and parameter robustness testing. Every strategy must remain profitable when we change each parameter by +/- 20%.

Why 100 trades minimum?

Statistical tests need sufficient sample size to distinguish signal from noise. With 30 trades, a strategy could appear profitable purely by luck. At 100+ trades, the confidence intervals tighten enough to draw meaningful conclusions.

Real example: H26 (Smash Day Type B) had 57 trades on ES with a 71.9% win rate and p=0.014. Looks solid. But with only 57 trades, the confidence interval for true win rate spans from ~59% to ~83%. That's too wide to trust. It also failed walk-forward (56%), confirming the small sample was misleading.

7 strategies are live on Topstep XFA today. One of them is remarkably simple. Two rules. No indicators for entry. That's Module 2.

Check your understanding

A strategy backtested over 3 years shows:

45 trades • $12,400 profit • 73% win rate • Profit factor 2.8

Is this strategy ready to trade live?

B) Correct. 45 trades is far below the 100-trade minimum. With a sample that small, a 73% win rate could easily be luck.

This is exactly what happened with H26 (Smash Day Type B): 57 trades, 71.9% WR, p=0.014 on ES. Looked great. Failed walk-forward at 56%. The small sample was hiding instability.

Profit factor and win rate mean nothing without sufficient sample size.

7 strategies survived. The simplest one uses just two rules and no indicators for entry.

Module 2: The Edge →