Sorting Prompts Evolution - Finding the Perfect Balance

February 1, 2025 · 9 min read

Architect

In our quest to improve market comparison accuracy, we've experimented with different versions of sorting prompts. This analysis compares their effectiveness and helps identify the optimal approach.

Test Setup

Markets: BTCUSDT vs ETHUSDT
Timeframe: 4h
Core Indicators: RSI, MACD, Bollinger Bands, Volume, ADX, Volatility, Ichimoku
Models Tested: LLAMA3.2:3B, GEMMA2:2B, DEEPSEEK:R1:1.5B

Entry Prompts Analysis

LLAMA3.2:3B

Below is a concise comparative analysis of the five prompt versions (including the new Version 5) and their outputs, focusing on:

Most Detailed and Well-Reasoned Analysis
Correlation Between Prompt Length/Structure and Reasoning Quality
Best Balance of Conciseness and Effectiveness
Patterns in Confidence Levels

1. Most Detailed and Well-Reasoned Analysis

Version 1 (entry_v1)
- Key Focus: A fairly short prompt. The final reasoning primarily highlights MACD, ADX, and some Bollinger/stochastic data.
- Detail Level: Moderate. Mentions a handful of indicators but omits deeper numeric comparisons beyond referencing ADX/stochastic.
Version 2 (entry_v2)
- Key Focus: Walks through RSI, MACD, Bollinger, volume, ADX, volatility compression, Ichimoku—but in a more narrative style.
- Detail Level: Moderate. It references the key indicators but does not always provide numeric details or rigorous thresholds in the explanation.
Version 3 (entry_v3)
- Key Focus: RSI, MACD, Bollinger, volume, ADX, Ichimoku. The final text references multiple indicators but is somewhat brief.
- Detail Level: Moderate. Similar to Version 2, it covers most signals but does not dissect them exhaustively in the conclusion.
Version 4 (entry_v4)
- Key Focus: RSI, MACD, Bollinger, volume, ADX, volatility, Ichimoku. The prompt also introduces a scoring system (+1, 0, -1).
- Detail Level: Fairly good. It systematically references more indicators in the final reasoning and ties them to a “SWING BUY” framework.
Version 5 (entry_v5)
- Key Focus: Thoroughly lists seven criteria (RSI, MACD, Bollinger, volume, ADX, volatility, support).
- Detail Level: Potentially the highest coverage of indicators. However, in its final reasoning, there are contradictory or inconsistent numeric comparisons (e.g., calling a 15.3052 MACD strength “lower” than 0.2453).

Overall:

Version 5 references all key metrics, so it has the broadest coverage.
Version 4 is more internally consistent (fewer contradictory statements) while still referencing multiple indicators.

Depending on your priority (completeness vs. consistency), either Version 4 or 5 may be considered “most detailed/well-reasoned.” Version 5 has the fullest scope; Version 4 may be more coherent.

2. Correlation of Prompt Length & Structure With Reasoning Quality

Version 1 is the shortest, producing a concise final reasoning with only moderate detail.
Versions 2–5 have increasingly structured prompts (lists of indicators or scoring criteria), leading to more comprehensive coverage in the final text.

The general trend is: more structured prompts yield more thorough final analyses. Versions 4 and 5, which explicitly list each indicator and threshold, lead to the broadest coverage.

3. Best Balance of Conciseness and Effectiveness

Version 1 is concise but leaves out deeper numeric references or thorough coverage.
Versions 2 & 3 are moderate in length and detail.
Version 4 strikes a nice balance: it references most indicators succinctly, avoiding contradictory numeric claims.
Version 5 covers everything but can be less clear due to inconsistent numeric statements.

Conclusion: Version 4 likely offers the best trade-off between completeness and concise clarity.

4. Patterns in Confidence Levels

Here are the reported confidence scores for each version:

Version 1: 73.0%
Version 2: 75.0%
Version 3: 75.0%
Version 4: 85.0%
Version 5: 85.0%

(This set of data is slightly different from previously observed values in other prompt sets, but the takeaway is the same: confidence does not directly correlate with prompt length or thoroughness. For example, both Versions 4 and 5 are at 85%, while Version 1 is 73%.)

Overall Observation: The confidence metric appears to be influenced by how strongly the final text “asserts” a particular outcome, rather than by the thoroughness or brevity of the prompt.

Summary

Most Detailed/Well-Reasoned:
- Version 5 covers all the requested indicators, but the final text has some contradictory numeric references.
- Version 4 is more coherent, also covering most signals without glaring inconsistencies.
Prompt Length/Structure & Reasoning Quality:
- Strong correlation. Prompts that clearly list each indicator (and thresholds) produce more robust final reasonings.
Best Balance of Conciseness & Effectiveness:
- Version 4 is a strong candidate—concise yet covering most points logically.
Confidence Levels:
- Range from 73% to 85%. No straightforward link to prompt length; it depends more on the model’s internal “certainty” in the final outcome.

In short:

Version 4 is likely the “best balanced” in clarity and scope.
Version 5 has the broadest coverage but occasionally slips up on numeric accuracy.

GEMMA2:2B

Below is an updated comparative analysis of the five prompt versions (including the newly added Version 5) and their respective outputs, focusing on:

Most detailed and well-reasoned analysis
Correlation between prompt length/structure and reasoning quality
Best balance of conciseness and effectiveness
Patterns in confidence levels

1. Most Detailed and Well-Reasoned Analysis

Let’s look at each version’s final reasoning section and determine how thoroughly it addresses the requested indicators, how coherently it draws conclusions, and whether it cites relevant numeric data or thresholds accurately.

Version 1 (entry_v1)
- Analysis Focus: RSI, Bollinger, ADX, plus a brief reference to volatility compression.
- Detail Level: Modest. The final reasoning references only a subset of indicators and mostly in a qualitative manner.
Version 2 (entry_v2)
- Analysis Focus: RSI, MACD, Ichimoku, volume, ADX, and volatility.
- Detail Level: Moderate. It references several metrics with some detail but does not strongly cite numeric thresholds or expand on how each indicator influences the final decision.
Version 3 (entry_v3)
- Analysis Focus: RSI, MACD, Bollinger, Ichimoku, volume, ADX.
- Detail Level: Moderate. It references multiple indicators but only briefly. Numeric values are mentioned in the prompt data rather than heavily in the final reasoning.
Version 4 (entry_v4)
- Analysis Focus: RSI, MACD, Bollinger, volume, ADX, plus some mention of volatility.
- Detail Level: Fairly good. The final reasoning references many relevant signals in one paragraph but does not deeply compare exact numeric thresholds for each.
Version 5 (entry_v5)
- Analysis Focus: RSI, MACD, Bollinger, volume, ADX, volatility, and “support level validation” (though the final reasoning lumps Ichimoku under “support” but does not expand on it thoroughly).
- Detail Level: The final explanation mentions multiple indicators but has internal inconsistencies (e.g., it calls a 15.3052 MACD signal strength “weak” while referencing a smaller 0.2453 number as “strong”). These contradictions can undermine clarity.

Despite the inconsistencies in the actual numeric comparisons, Version 5 does methodically reference all the listed comparison points: RSI, MACD, Bollinger, volume, ADX, volatility, and support. If we strictly judge by how many criteria it attempts to address (rather than perfect numeric alignment), Version 5 is the most comprehensive in structure.

However, if we prioritize consistency and correctness of the final reasoning, Version 4 may feel more coherent—fewer contradictions, references multiple metrics, and states them more logically.

Conclusion:

Highest coverage of all metrics: Version 5
Most coherent/consistent explanation (fewest contradictions): Possibly Version 4

Depending on whether you favor breadth vs. internal consistency, either Version 4 or Version 5 may come out on top for “most detailed and well-reasoned.”

2. Correlation of Prompt Length & Structure With Reasoning Quality

In general:

Version 1 is a short, informal prompt → yields a concise but only moderately detailed conclusion.
Versions 2–5 are more structured, with each step or indicator spelled out → generally yield more comprehensive final reasonings.

Version 5 is especially structured, clearly listing the comparison points in bullet form (RSI, MACD, Bollinger, etc.). This direct correlation between a methodical prompt structure and a more wide-ranging discussion in the final text is quite apparent.

Takeaway: More explicit instructions about the exact metrics to compare—and clear thresholds for each—tend to produce deeper, more systematic analyses in the final output.

3. Best Balance of Conciseness and Effectiveness

Version 1 is concise but leaves out several indicators in the final explanation.
Versions 2 & 3 each provide moderate detail without going excessively long; they are fairly balanced.
Version 4 references nearly all major indicators concisely in one paragraph.
Version 5 addresses all seven listed criteria in the prompt, but the final reasoning is a bit longer and suffers from some contradictory references to signal strengths.

If “concise yet hits all major points” is the priority, Version 4 arguably achieves a sweet spot. It’s not overly wordy yet covers RSI, MACD, Bollinger, volume, ADX, and volatility. Version 5 is very thorough but has minor internal inconsistencies.

4. Patterns in Confidence Levels

Looking at the reported confidence scores:

Version 1: 75.0%
Version 2: 85.0%
Version 3: 80.0%
Version 4: 75.0%
Version 5: 83.0%

We again see that confidence does not strictly track with either prompt length or detail. The highest confidence (85%) appears in Version 2, which is only moderately long. Meanwhile, Version 5’s thoroughness lands at 83%. Version 1 and Version 4, each shorter in reasoning, both show 75% confidence.

Conclusion on Confidence: The numeric confidence appears to be an internal model estimate influenced by how strongly the final reasoning justifies a “winner,” not necessarily by the prompt’s thoroughness or length.

Summary

Most Detailed / Well-Reasoned:
- Version 5 covers all the requested metrics, but ironically contains some contradictory references to signal strengths.
- Version 4 avoids big contradictions and still references multiple metrics clearly.
Prompt Length/Structure & Reasoning Quality:
- Strong correlation: versions that specify each indicator and how to compare them produce more thorough final reasonings.
Best Balance of Conciseness & Effectiveness:
- Version 4 hits a concise yet informative sweet spot.
- Version 5 is robust but can be harder to trust due to internal inconsistencies.
Confidence Levels:
- Range from 75% to 85%.
- No direct correlation to prompt size or detail. Likely depends on the internal weighting of signals.

Overall:

If you want maximum coverage of the indicators—albeit with some questionable numeric references—Version 5 attempts the most thorough approach.
If you want a concise yet consistent and logical final explanation, Version 4 seems the best.

DEEPSEEK:R1:1.5B

Below is an updated comparative analysis of the four new prompt versions and their respective generated results, followed by a brief comparison to the previous analysis you requested.

1. Most Detailed and Well-Reasoned Analysis

Version 1 (entry_v1)
- Reasoning: Short and references only a few indicators explicitly (primarily ADX, RSI, Ichimoku, MACD).
- Detail Level: Low to Moderate. It does not systematically address Bollinger, volume, or volatility in the explanation.
Version 2 (entry_v2)
- Reasoning: Moderately thorough. References RSI, MACD, Ichimoku, trend, and volatility. It provides some justification but not many raw numeric values.
- Detail Level: Moderate.
Version 3 (entry_v3)
- Reasoning: Briefer than expected—mentions RSI, MACD, Ichimoku, but does not explicitly discuss Bollinger or volume in detail this time.
- Detail Level: Moderate but less granular than some previous versions of “entry_v3” you’ve shared before.
Version 4 (entry_v4)
- Reasoning: References most of the relevant indicators (RSI, MACD, Bollinger, volume, ADX, volatility, Ichimoku) with numerical data or threshold comments. It also summarizes that Market B meets multiple “SWING BUY” criteria.
- Detail Level: High. Among these four, Version 4 explicitly addresses nearly all stated metrics in the final reasoning.

Conclusion on Detail/Reasoning: Version 4 presents the widest range of indicator references and relates them directly to the buy criteria, making it the most detailed and well-reasoned among these four new results.

2. Correlation Between Prompt Length & Structure and Reasoning Quality

Version 1 has a short, informal prompt and yields a brief final analysis.
Versions 2, 3, and 4 have more structured, step-by-step prompts. Each also yields a more elaborate reasoning.

In this set, Version 4 is both structured and explicitly instructs the model to check each indicator against a threshold, which encourages a comprehensive final explanation. So again, we see a positive correlation: more structured (and somewhat longer) prompts lead to more complete reasoning.

3. Best Balance of Conciseness and Effectiveness

Version 1 is concise but omits significant detail.
Version 2 and Version 3 are moderate in length and detail, but each is missing some coverage of metrics in the final reasoning.
Version 4 is somewhat longer in the final summary but covers every indicator point (RSI, MACD, Bollinger, volume, ADX, volatility, Ichimoku).

If “concise yet comprehensive” is the goal, Version 4 is the best balance here: it methodically checks each indicator against clear thresholds in just a short paragraph. It may not be ultra-concise, but it effectively covers all points.

4. Patterns in Confidence Levels

New confidence scores are:

Version 1: 85.0%
Version 2: 80.0%
Version 3: 70.0%
Version 4: 75.0%

There is no clear linear pattern (e.g., “longer prompt = higher confidence”); indeed, Version 1’s short approach yields the highest confidence (85%), while the most thorough Version 4 is 75%. This aligns with past observations that the confidence metric in these outputs depends more on the internal weighting of signals than strictly on prompt detail.

Comparison With Your Previous Results

In a previous set of “entry_v1–v4” analyses, we observed:

Version 3 was the most detailed and well-reasoned (at that time).
Confidence levels tended to be within 0.75–0.85, similar to now.
Prompts that specified step-by-step indicators and thresholds led to more comprehensive reasoning.

Comparing the new results:

Version 1 previously showed shorter reasoning and somewhat lower detail; it remains similar in that regard, even though the winner changed from ETHUSDT to BTCUSDT.
Version 4 in the new batch is significantly more thorough in citing each indicator, whereas the previous best was Version 3. So there is a shift: the new Version 3 is less exhaustive than it was before, and the new Version 4 now comes out on top in detail.

Despite these changes, the broad patterns hold:

Structured prompts lead to more comprehensive outputs.
Confidence levels still cluster around 70–85%.
The “best” version in terms of detail and clarity can change depending on how the final reasoning text is generated.

Summary

Most Detailed & Well-Reasoned: Version 4 in this new series.
Length vs. Quality: Longer, more structured prompts still generate richer conclusions.
Best Balance of Conciseness and Effectiveness: Version 4 — it addresses all indicators succinctly while remaining clear.
Confidence Levels: Range from 70% to 85%, consistent with prior patterns of moderate-high certainty, with no direct relationship to prompt length.

In comparison to your older prompts, the same general principles stand, though some versions (especially #3 vs. #4) swapped in thoroughness.

Test Setup​

Entry Prompts Analysis​

LLAMA3.2:3B​

1. Most Detailed and Well-Reasoned Analysis​

2. Correlation of Prompt Length & Structure With Reasoning Quality​

3. Best Balance of Conciseness and Effectiveness​

4. Patterns in Confidence Levels​

Summary​

GEMMA2:2B​

1. Most Detailed and Well-Reasoned Analysis​

2. Correlation of Prompt Length & Structure With Reasoning Quality​

3. Best Balance of Conciseness and Effectiveness​

4. Patterns in Confidence Levels​

Summary​

DEEPSEEK:R1:1.5B​

1. Most Detailed and Well-Reasoned Analysis​

2. Correlation Between Prompt Length & Structure and Reasoning Quality​

3. Best Balance of Conciseness and Effectiveness​

4. Patterns in Confidence Levels​

Comparison With Your Previous Results​

Summary​

Test Setup

Entry Prompts Analysis

LLAMA3.2:3B

1. Most Detailed and Well-Reasoned Analysis

2. Correlation of Prompt Length & Structure With Reasoning Quality

3. Best Balance of Conciseness and Effectiveness

4. Patterns in Confidence Levels

Summary

GEMMA2:2B

1. Most Detailed and Well-Reasoned Analysis

2. Correlation of Prompt Length & Structure With Reasoning Quality

3. Best Balance of Conciseness and Effectiveness

4. Patterns in Confidence Levels

Summary

DEEPSEEK:R1:1.5B

1. Most Detailed and Well-Reasoned Analysis

2. Correlation Between Prompt Length & Structure and Reasoning Quality

3. Best Balance of Conciseness and Effectiveness

4. Patterns in Confidence Levels

Comparison With Your Previous Results

Summary