These weren't random ideas from Reddit. Each one was a structured hypothesisA specific, testable prediction about market behaviour. Example: "Small gaps on ES fill within the first hour at least 65% of the time.", tested against the same validation bar. Statistical significance. Real costs. Out-of-sample consistency.
Most of them were promising in backtest. Some had 70%+ win rates. Several made money on paper. But when we applied the full validation pipeline, almost all of them broke.
Every strategy must pass all three. No exceptions.
At least 100 trades. Less than 5% chance the results are random. Small samples lie.
Include slippageThe difference between the expected fill price and actual fill price. We assume 1 tick per side as a realistic worst-case. (1 tick/side) and commissionsBroker fees per trade. ~$2.50 round-trip for MES, ~$5.00 for ES. ($2.50-5.00 RT). Many strategies with "edge" are actually negative after costs.
Split the data into sequential windows. The strategy must work on data it has never seen. This catches overfittingWhen a strategy is tuned so tightly to historical data that it captures noise, not signal. Looks great in backtest, fails live..
A near-90% failure rate isn't bad luck. It's the expected outcome of honest validation. The value isn't in finding strategies. It's in killing the ones that don't work before you risk real money.
The hard part is what happens when a validated strategy has 16 losses in 22 trades over a full year, and you have to take the next signal anyway. Module 6 shows you exactly what that looks like with real trades from the 2022-2023 bear market, when the showcase H12 strategy ran at 27.3% win rate for two years and still cleared $2,938 — because the math of asymmetric R:R does not care how it feels.
Overfitting is when a strategy learns the noise in historical data instead of the signal. It looks fantastic in backtest because it has been tuned to match every zig and zag of the past. Then it fails live because those specific patterns were random.
Classic signs: the strategy uses many parameters, performance degrades sharply with small parameter changes, and it works on the training data but not on held-out data.
Our defence: Gate 3 (walk-forward validation) and parameter robustness testing. Every strategy must remain profitable when we change each parameter by +/- 20%.
Statistical tests need sufficient sample size to distinguish signal from noise. With 30 trades, a strategy could appear profitable purely by luck. At 100+ trades, the confidence intervals tighten enough to draw meaningful conclusions.
Real example: H26 (Smash Day Type B) had 57 trades on ES with a 71.9% win rate and p=0.014. Looks solid. But with only 57 trades, the confidence interval for true win rate spans from ~59% to ~83%. That's too wide to trust. It also failed walk-forward (56%), confirming the small sample was misleading.
7 strategies are live on Topstep XFA today. One of them is remarkably simple. Two rules. No indicators for entry. That's Module 2.
A strategy backtested over 3 years shows:
45 trades • $12,400 profit • 73% win rate • Profit factor 2.8
Is this strategy ready to trade live?
B) Correct. 45 trades is far below the 100-trade minimum. With a sample that small, a 73% win rate could easily be luck.
This is exactly what happened with H26 (Smash Day Type B): 57 trades, 71.9% WR, p=0.014 on ES. Looked great. Failed walk-forward at 56%. The small sample was hiding instability.
Profit factor and win rate mean nothing without sufficient sample size.
7 strategies survived. The simplest one uses just two rules and no indicators for entry.
Module 2: The Edge →