LLM Behavior Insights: Technical Analysis and Market Reasoning

February 17, 2025 · 4 min read

Architect

While implementing large language models for technical analysis and market reasoning, we've discovered several fascinating insights about how these models process and reason about market data. These findings challenge common assumptions about temperature settings and prompt engineering, while offering practical solutions for more reliable analysis.

Overview

Main discoveries we'll explore:

The Temperature Paradox: Higher precision at higher temperatures
Prompt Style Impact: How different prompt structures affect reasoning quality
Multi-Perspective Analysis: Using varied prompts for confidence validation
Real-world Examples: Concrete cases of model behavior in technical analysis

The Temperature Paradox

One of our most surprising discoveries was that temperature settings don't always behave as expected. Conventional wisdom suggests that lower temperatures (T=0) should produce more precise and deterministic outputs. However, we found cases where a temperature of 0 produced incorrect numerical comparisons and flawed RSI interpretations, while higher temperatures sometimes yielded more accurate analysis.

This challenges the common practice of using T=0 for precise tasks. The explanation might lie in how temperature affects the model's access to its knowledge base:

At T=0: The model might be overly constrained, falling back to pattern matching rather than applying its full understanding
At higher T: The model might better access its broader knowledge, leading to more nuanced analysis

Prompt Style Impact

We identified two distinct prompt styles that excel in different aspects:

Natural Language Style (v1):

// Example from v1.ts
'Analyze these market conditions and identify potential trading opportunities...';

Structured Analysis Style (v8):

// Example from v8.ts
"1. Compare RSI values
2. Evaluate volume profiles
3. Assess trend direction..."

Each style showed unique strengths:

Natural Language (v1): Better at explaining reasoning and pattern recognition
Structured (v8): Superior at systematic comparisons and decision-making

Multi-Perspective Analysis

Rather than relying on temperature variations alone, we developed a more robust approach using multiple prompt styles:

Use consistent temperature
Apply different prompt styles:
- Pattern recognition prompt
- Systematic analysis prompt
- Comparative analysis prompt
Cross-validate results across prompts

This approach provides several benefits:

Multiple perspectives on the same data
Built-in validation through prompt diversity
Higher confidence when different styles agree
Better error detection when styles disagree

Real-world Examples

Case Study: RSI Analysis

In one notable case, the model at T=0 produced this analysis:

Market A has RSI of 45 and Market B has RSI of 65
Therefore, Market A is oversold and Market B is neutral

This revealed a fundamental misunderstanding of RSI thresholds, despite the low temperature setting. The same comparison with our multi-perspective approach caught this error through conflicting interpretations.

Key Takeaways

Temperature settings may have counter-intuitive effects on model precision
Different prompt styles excel at different aspects of analysis
Multi-perspective prompting can provide more reliable results than temperature tuning
Model confidence doesn't always correlate with accuracy
Basic numerical comparisons should be validated, even at T=0

Next Steps

For developers implementing LLMs in technical analysis:

Test your assumptions about temperature settings
Develop multiple prompt styles for different aspects of analysis
Implement cross-validation between different prompt styles
Add explicit validation for basic numerical comparisons
Consider using ensemble approaches for critical decisions

Remember: The goal isn't just to get an answer, but to get a reliable answer with well-understood confidence levels.

Internal Research Notes

Initial Observation

We observed a single case where at temperature 0, with a confidence of 0.75, the model produced this reasoning:

// Single observed case
{
  reasoning: "Market A has RSI of 45 and Market B has RSI of 65. Therefore, Market A is oversold and Market B is neutral",
  confidence: 0.75,
  // Other fields omitted for clarity
}

This single observation raises several interesting questions:

Questions About Temperature

Temperature and Precision
- What role does temperature actually play in numerical comparisons?
- Could temperature 0 be limiting the model's access to its knowledge in some way?
- How would this same comparison fare at different temperatures?
- Is our assumption about temperature 0 being "most precise" worth questioning?
Confidence Interpretation
- What does a 0.75 confidence score mean in this context?
- How does the model arrive at its confidence score?
- Would the confidence score change with temperature?
- Is confidence related to internal consistency or actual accuracy?

Questions About Reasoning Process

Understanding vs Pattern Matching
- Is the model actually performing numerical comparison?
- How does it arrive at the "oversold" conclusion for RSI 45?
- Could this be pattern matching rather than understanding?
- What other examples would help us differentiate between the two?
Prompt Impact
- How would different prompt structures affect this same analysis?
- What if we asked for the comparison in different ways?
- Could the prompt itself be leading to these interpretations?

Research Directions

Immediate Questions to Investigate
- Can we replicate this behavior with similar RSI values?
- What happens with clearly oversold/overbought values (e.g., 20 vs 80)?
- Does the model maintain consistent RSI interpretations across different pairs?
- How does the reasoning change if we ask for explicit RSI threshold comparisons?
Broader Questions
- How do we validate the model's understanding of technical concepts?
- What constitutes a good confidence metric for technical analysis?
- Could multiple prompting styles help reveal understanding depth?
- How do we distinguish between memorized patterns and actual comprehension?

Potential Approaches to Test

Comparative Analysis
- What if we tried the same comparison with different prompt styles?
- Could we learn more by varying only one parameter at a time?
- How would explicit vs implicit questions about RSI affect the response?
Validation Ideas
- What would constitute a minimal test set for RSI understanding?
- How could we systematically explore the model's technical analysis capabilities?
- What baseline comparisons should we establish first?

Next Steps

Data Collection Needs
- What additional examples do we need to collect?
- How can we systematically document model responses?
- What metadata should we track with each observation?
Investigation Structure
- How do we organize our exploration of these questions?
- What would a systematic testing framework look like?
- How do we prioritize which questions to investigate first?

Remember: This is a starting point for investigation. Our single observation raises interesting questions but doesn't provide answers. The value lies in exploring these questions systematically and remaining open to unexpected findings.

Overview​

The Temperature Paradox​

Prompt Style Impact​

Multi-Perspective Analysis​

Real-world Examples​

Case Study: RSI Analysis​

Key Takeaways​

Next Steps​

Internal Research Notes​

Initial Observation​

Questions About Temperature​

Questions About Reasoning Process​

Research Directions​

Potential Approaches to Test​

Next Steps​

Overview

The Temperature Paradox

Prompt Style Impact

Multi-Perspective Analysis

Real-world Examples

Case Study: RSI Analysis

Key Takeaways

Next Steps

Internal Research Notes

Initial Observation

Questions About Temperature

Questions About Reasoning Process

Research Directions

Potential Approaches to Test

Next Steps