Skip to main content

LLM Behavior Insights: Technical Analysis and Market Reasoning

· 4 min read
Max Kaido
Architect

While implementing large language models for technical analysis and market reasoning, we've discovered several fascinating insights about how these models process and reason about market data. These findings challenge common assumptions about temperature settings and prompt engineering, while offering practical solutions for more reliable analysis.

Overview

Main discoveries we'll explore:

  • The Temperature Paradox: Higher precision at higher temperatures
  • Prompt Style Impact: How different prompt structures affect reasoning quality
  • Multi-Perspective Analysis: Using varied prompts for confidence validation
  • Real-world Examples: Concrete cases of model behavior in technical analysis

The Temperature Paradox

One of our most surprising discoveries was that temperature settings don't always behave as expected. Conventional wisdom suggests that lower temperatures (T=0) should produce more precise and deterministic outputs. However, we found cases where a temperature of 0 produced incorrect numerical comparisons and flawed RSI interpretations, while higher temperatures sometimes yielded more accurate analysis.

This challenges the common practice of using T=0 for precise tasks. The explanation might lie in how temperature affects the model's access to its knowledge base:

  • At T=0: The model might be overly constrained, falling back to pattern matching rather than applying its full understanding
  • At higher T: The model might better access its broader knowledge, leading to more nuanced analysis

Prompt Style Impact

We identified two distinct prompt styles that excel in different aspects:

  1. Natural Language Style (v1):
// Example from v1.ts
'Analyze these market conditions and identify potential trading opportunities...';
  1. Structured Analysis Style (v8):
// Example from v8.ts
"1. Compare RSI values
2. Evaluate volume profiles
3. Assess trend direction..."

Each style showed unique strengths:

  • Natural Language (v1): Better at explaining reasoning and pattern recognition
  • Structured (v8): Superior at systematic comparisons and decision-making

Multi-Perspective Analysis

Rather than relying on temperature variations alone, we developed a more robust approach using multiple prompt styles:

  1. Use consistent temperature
  2. Apply different prompt styles:
    • Pattern recognition prompt
    • Systematic analysis prompt
    • Comparative analysis prompt
  3. Cross-validate results across prompts

This approach provides several benefits:

  • Multiple perspectives on the same data
  • Built-in validation through prompt diversity
  • Higher confidence when different styles agree
  • Better error detection when styles disagree

Real-world Examples

Case Study: RSI Analysis

In one notable case, the model at T=0 produced this analysis:

Market A has RSI of 45 and Market B has RSI of 65
Therefore, Market A is oversold and Market B is neutral

This revealed a fundamental misunderstanding of RSI thresholds, despite the low temperature setting. The same comparison with our multi-perspective approach caught this error through conflicting interpretations.

Key Takeaways

  • Temperature settings may have counter-intuitive effects on model precision
  • Different prompt styles excel at different aspects of analysis
  • Multi-perspective prompting can provide more reliable results than temperature tuning
  • Model confidence doesn't always correlate with accuracy
  • Basic numerical comparisons should be validated, even at T=0

Next Steps

For developers implementing LLMs in technical analysis:

  1. Test your assumptions about temperature settings
  2. Develop multiple prompt styles for different aspects of analysis
  3. Implement cross-validation between different prompt styles
  4. Add explicit validation for basic numerical comparisons
  5. Consider using ensemble approaches for critical decisions

Remember: The goal isn't just to get an answer, but to get a reliable answer with well-understood confidence levels.

Internal Research Notes

Initial Observation

We observed a single case where at temperature 0, with a confidence of 0.75, the model produced this reasoning:

// Single observed case
{
reasoning: "Market A has RSI of 45 and Market B has RSI of 65. Therefore, Market A is oversold and Market B is neutral",
confidence: 0.75,
// Other fields omitted for clarity
}

This single observation raises several interesting questions:

Questions About Temperature

  1. Temperature and Precision

    • What role does temperature actually play in numerical comparisons?
    • Could temperature 0 be limiting the model's access to its knowledge in some way?
    • How would this same comparison fare at different temperatures?
    • Is our assumption about temperature 0 being "most precise" worth questioning?
  2. Confidence Interpretation

    • What does a 0.75 confidence score mean in this context?
    • How does the model arrive at its confidence score?
    • Would the confidence score change with temperature?
    • Is confidence related to internal consistency or actual accuracy?

Questions About Reasoning Process

  1. Understanding vs Pattern Matching

    • Is the model actually performing numerical comparison?
    • How does it arrive at the "oversold" conclusion for RSI 45?
    • Could this be pattern matching rather than understanding?
    • What other examples would help us differentiate between the two?
  2. Prompt Impact

    • How would different prompt structures affect this same analysis?
    • What if we asked for the comparison in different ways?
    • Could the prompt itself be leading to these interpretations?

Research Directions

  1. Immediate Questions to Investigate

    • Can we replicate this behavior with similar RSI values?
    • What happens with clearly oversold/overbought values (e.g., 20 vs 80)?
    • Does the model maintain consistent RSI interpretations across different pairs?
    • How does the reasoning change if we ask for explicit RSI threshold comparisons?
  2. Broader Questions

    • How do we validate the model's understanding of technical concepts?
    • What constitutes a good confidence metric for technical analysis?
    • Could multiple prompting styles help reveal understanding depth?
    • How do we distinguish between memorized patterns and actual comprehension?

Potential Approaches to Test

  1. Comparative Analysis

    • What if we tried the same comparison with different prompt styles?
    • Could we learn more by varying only one parameter at a time?
    • How would explicit vs implicit questions about RSI affect the response?
  2. Validation Ideas

    • What would constitute a minimal test set for RSI understanding?
    • How could we systematically explore the model's technical analysis capabilities?
    • What baseline comparisons should we establish first?

Next Steps

  1. Data Collection Needs

    • What additional examples do we need to collect?
    • How can we systematically document model responses?
    • What metadata should we track with each observation?
  2. Investigation Structure

    • How do we organize our exploration of these questions?
    • What would a systematic testing framework look like?
    • How do we prioritize which questions to investigate first?

Remember: This is a starting point for investigation. Our single observation raises interesting questions but doesn't provide answers. The value lies in exploring these questions systematically and remaining open to unexpected findings.