RTH vs ETH: The Data Distribution Mismatch That Broke My Stop Model

After solving the rollover problem, I thought I had my trailing stop dataset under control.

Then I realized I had made another silent mistake:

I trained on ETH and RTH, but I only trade RTH.

That mismatch mattered more than I expected.

Futures Trade 24 Hours. Traders Don’t.

ES trades nearly around the clock.
Liquidity, however, does not.

There are effectively two different markets inside one contract:

RTH (Regular Trading Hours) — deep liquidity, institutions active
ETH (Electronic Trading Hours) — thinner liquidity, different participants, different behavior

To a machine learning model, these are two different data-generating processes.

What My Dataset Looked Like

I had exported everything:

Overnight Asia
Europe
Pre-market
RTH
Post-close drift

More data felt better.
More bars felt statistically comforting.

But I was training a model on data I never trade.

Why This Matters for Trailing Stops

Trailing stops are microstructure-sensitive.

ETH has:

Wider spreads
Different volatility clustering
More stop runs
Different mean reversion behavior

RTH has:

Institutional volume
Macro flows
News-driven volatility
Opening and closing auction dynamics

The model learned patterns that never occur in my trading environment.

The Distribution Shift Problem

Machine learning assumes:

Training data distribution ≈ Production data distribution

I violated that assumption.

The model saw mostly ETH bars, but I only acted during RTH.

So it learned:

Tighten aggressively in thin liquidity
Hold during institutional flow
Volatility patterns that don’t exist during RTH

In production, it behaved strangely.

Not catastrophically wrong.
Just quietly suboptimal.

The Fix: RTH-Only Training

I made a hard decision:

Train Machine B only on RTH data.

That meant:

Filtering bars to 08:30–15:00 CT
Recomputing EMA and ATR on RTH-only data
Validating on RTH-only data
Deploying only in RTH

The model now learned the world it actually traded.

A Subtle Indicator Trap

Even after filtering bars, I realized something subtle:

If you compute EMA/ATR on full-session data but trade RTH-only,
your indicators still embed ETH volatility.

So I recomputed:

EMA on RTH bars only
ATR on RTH bars only

This changed feature distributions materially.

Why More Data Was Worse

This was counterintuitive.

ETH gave me:

More bars
More training samples
Better-looking cross-validation metrics

But worse live performance.

Because the model was optimized for a market I wasn’t trading.

My Production Rule

This became a hard constraint:

Train on what you trade. Trade on what you trained.

If I ever trade ETH, I will train a separate ETH stop model.

Mixing them is easy.
Separating them is correct.

The Engineering Lesson

Data quantity is not the same as data relevance.

Distribution mismatch is silent.
Backtests won’t scream.
Live trading will.

What Comes Next

At this point, Machine B had:

Contract-aware training
Session-aware training
Lookahead-safe labels

The final challenge was productionizing the pipeline:

Rolling retrains
Walk-forward validation
Confidence gating in live trading
Safety rules when ML is wrong

In the next post, I’ll walk through how I built Machine B as a production system, not a research toy.

Previous: The Futures Rollover Trap →

Next: Building Machine B: From Research Model to Production System →