Up to this point, the idea of a machine-learned trailing stop engine sounded straightforward.
Give a model market features.
Ask it when to tighten the stop.
Then I hit the real problem:
How do you define a “good” trailing stop decision without looking into the future?
The Labeling Trap
Machine learning is only as good as its labels.
For trailing stops, the naive idea is:
- Tighten = good if trade makes money
- Hold = bad if trade loses
That sounds reasonable. It’s also completely wrong.
Why?
Because trailing stops are path dependent.
A tighten decision can be good even if the trade eventually loses, and bad even if the trade eventually wins.
What matters is what happened after the decision.
Trailing Stops Are Micro-Decisions
Every bar after entry is a decision point:
- Tighten now
- Hold now
That means one trade produces dozens or hundreds of labeled decisions.
Trailing stops are not trade-level labels.
They are bar-level decisions with forward outcomes.
That changed how I thought about the dataset.
Defining “Good” Without Cheating
I framed the question like this:
If I tighten the stop right now, does that improve the outcome compared to holding?
Of course, you can’t know that in real time.
But you can define it in historical data using forward windows.
My Working Label Definition
For each bar after entry:
GOOD TIGHTEN
- Price moves at least X ticks in my favor
- Before the tightened stop would have been hit
- Within Y minutes
BAD TIGHTEN
- The tightened stop is hit
- Before price moves +X ticks
HOLD / NEUTRAL
- Neither condition happens within Y minutes
This avoids hindsight bias because:
- The label only looks forward a fixed window
- It doesn’t use the final trade outcome
- It treats each bar as a decision point
Why This Matters More Than the Model
You can use:
- Random Forest
- Gradient Boosting
- Neural networks
- Logistic regression
If your labels are wrong, all models will be wrong.
Label design is where trading ML actually lives.
Why Random Forest Was My First Choice
I started with Random Forests because:
- They handle nonlinear interactions well
- They’re robust to noisy features
- They work well on tabular trading data
- They’re interpretable (feature importance matters for trust)
This was important because Machine B controls real risk.
I wanted a model I could reason about.
A Subtle Realization
Once I built the labeling pipeline, I realized something uncomfortable:
Most of my discretionary trailing decisions were inconsistent with my own historical “good” labels.
In other words, my intuition was not aligned with statistical outcomes.
That was the first sign that Machine B might actually help.
Thesis: If trailing stops are systematic, consistent, and adaptive, the equity curve will take care of itself.
What Comes Next
Defining labels solved one problem and revealed three more:
- Futures contracts roll every quarter
- NinjaTrader exports raw contract prices (no back-adjustment)
- EMA and ATR behave differently across contracts
- RTH and ETH are different volatility regimes
All of that breaks ML models in silent ways.
In the next post, I’ll explain the futures rollover trap and why your model quietly degrades every quarter unless you engineer around it.