Same strategy. Same parameters. Same three-month backtest period. Four different reasoning modes. Here's what happened.
Test Setup#
Strategy: BTC momentum on 1-hour candles. Enter long when ADX > 25, RSI pulling back to 45-55, price near 21 EMA. Enter short on the inverse. Stop-loss at recent swing, take-profit at 2:1 reward-to-risk.
Parameters: 5% max position, 3x leverage, 3 concurrent positions, 15% max drawdown.
Period: September – November 2025 (three months of Hyperliquid BTC-PERP data).
AI Model: Same model (Claude Sonnet) across all four modes.
The only variable is the reasoning mode. Everything else is identical.
Results#
Analysis#
Direct: The Trigger-Happy Mode#
89 trades — the most of any mode. Direct takes every signal at face value. ADX above 25? RSI in range? Price near EMA? Execute. No analysis of whether the setup is actually good, whether market structure supports the trade, or whether conflicting signals exist.
The result: 51.7% win rate, barely above a coin flip. The profit factor is below 1.0 — meaning the agent lost money before credits. After credits, it's significantly negative. Direct mode on a complex strategy is worse than not trading.
Verdict: Don't use Direct for multi-factor strategies. It can't weigh signals against each other.
CoT: The Analyst#
76 trades — 15% fewer than Direct. Chain-of-Thought filtered 13 setups that Direct would have taken. In each case, the reasoning chain identified a problem: conflicting timeframe signals, declining volume, proximity to major resistance, or an upcoming funding settlement.
The 13 filtered trades would have been 62% losers. By skipping them, CoT improved the win rate from 51.7% to 57.9% and flipped the profit factor from losing to solidly profitable.
The reasoning chains are readable and auditable. Every trade includes a written analysis: here's what I see, here's what it means, here's my decision. When a CoT trade loses, you can read the reasoning and decide whether the logic was sound (bad luck) or flawed (needs adjustment).
Verdict: CoT is the best ratio of decision quality to credit cost. 3.4x the credit cost of Direct, but goes from -2.1% to +11.4% return.
GoT: The Strategist#
68 trades — even more selective. Graph-of-Thought doesn't just analyze the obvious setup. For each trigger, it evaluates multiple approaches:
- Path A: Enter the momentum trade
- Path B: Fade the move (mean reversion)
- Path C: Wait for a better entry
- Path D: Skip entirely
The key improvement over CoT isn't the winning trades — it's the losing trades. GoT's average loss is smaller because it sometimes chooses a conservative entry (tighter stop, smaller size) when the setup isn't clean. CoT would either take the full trade or skip it. GoT finds a middle ground.
61.8% win rate with a 1.73 profit factor. Max drawdown of 7.6% — half of Direct's 14.2%.
Verdict: GoT is worth the premium for high-conviction strategies. The parallel evaluation catches nuances that sequential reasoning misses.
ReAct: The Researcher#
52 trades — the fewest. ReAct spends more time gathering information before committing. For each trigger, it goes through multiple observe-reason-act cycles:
- Check the basic setup (ADX, RSI, EMA)
- Pull order book depth to gauge real liquidity
- Check correlated assets (ETH, SOL) for confirmation
- Review recent leaderboard positions for smart money alignment
- Evaluate funding rate for carry cost
- Make final decision
This iterative process filters aggressively. Many setups that pass the basic indicator checks fail on deeper analysis — thin order books, diverging correlated assets, or unfavorable funding conditions.
The 52 trades have a 64.2% win rate and 1.91 profit factor. Best of all four modes by every metric except trade count.
Verdict: ReAct produces the best decisions but at 11x the credit cost of Direct. The net return after credits ($816) is the highest, but the marginal improvement over GoT ($620) is modest relative to the additional cost.
The Credit-Quality Curve#
Plotting net return against total credit spend reveals a clear pattern:
- Direct → CoT: Massive improvement. +$815 net return gain for +$430 in credits. Every additional dollar of credit spend returns $1.90.
- CoT → GoT: Solid improvement. +$88 net return gain for +$412 in credits. Each additional credit dollar returns $0.21.
- GoT → ReAct: Marginal improvement. +$196 net return gain for +$124 in credits. Each additional credit dollar returns $1.58.
The biggest bang for your buck is upgrading from Direct to CoT. After that, the returns diminish — GoT and ReAct are better, but the improvement per credit dollar shrinks.
Recommendations#
For most users: Chain-of-Thought. It filters bad trades, produces readable reasoning, and costs a fraction of GoT/ReAct. The 11.4% return with $608 in credits is the practical sweet spot.
For high-value decisions: Graph-of-Thought. When the decision is worth the extra analysis — large positions, unclear setups, conflicting signals — GoT's parallel evaluation catches things CoT misses.
For scanning and discovery: ReAct. When the agent needs to explore (scan 50 markets, research a new strategy, analyze an unusual market condition), ReAct's iterative approach is uniquely suited. Don't use it for routine execution.
For execution only: Direct. When the decision is already made and the agent just needs to place the order, Direct is fast and cheap. Never use it for analysis.
Single-Goal Reality#
When this comparison was first published, we recommended splitting workflows across multiple goals with different reasoning modes — a ReAct scanner feeding a CoT analyzer feeding a Direct executor. That was goal chaining, and it made sense when older models struggled with large context windows.
With Sonnet 4.6 and Opus 4.6, that's no longer necessary for most strategies. A single CoT or ReAct goal can now scan markets, analyze setups, evaluate risk, and execute trades in one run. The model holds the entire workflow in context without losing coherence. Most users now run one goal per trigger or asset.
The reasoning mode comparison above still holds — CoT is still the sweet spot, ReAct still produces the best decisions. You're just picking one mode for the whole job instead of stitching three modes together.
For most users today: one CoT goal per asset. It handles the full scan-analyze-execute loop in a single pass. Use ReAct when you need the agent to explore unfamiliar territory. Use Direct only for pure execution where the decision is already made externally.
Same strategy, four reasoning modes: Direct loses money, CoT returns 11.4%, GoT returns 16.8%, ReAct returns 19.2%. Chain-of-Thought is the sweet spot for most users.