Optimizing Market Analysis with Model Weights: A Data-Driven Approach

January 18, 2025 · 3 min read

Introduction

In our continuous effort to improve the accuracy of market analysis, we've conducted a detailed study of our LLM models' performance in technical analysis interpretation. This post details our findings and the resulting weight optimization strategy.

Model Performance Analysis

We evaluated each model's ability to correctly interpret and compare technical indicators, focusing on:

Accuracy in numeric interpretation
Consistency in indicator analysis
Proper understanding of technical patterns
Alignment with established TA principles

Detailed Findings

export const MODEL_WEIGHTS = {
  gemma2: 1.3, // Most accurate (no numeric mix-ups, solid alignment)
  mistral: 1.2, // Very close second (correct on ADX/MACD but slight ATR confusion)
  llama3_2: 1.05, // Third: decent structure, minor RSI "oversold" mislabel
  falcon3: 1.0, // Baseline among the remainder
  phi3_5: 0.95, // More factual mismatches
  qwen2_5: 0.9, // Inverted ADX values, calling RSI ~59 "oversold"
  marco_o1: 0.9, // Good structure but also inverts ADX
} as const;

Rationale

gemma2 (1.3)
- Highest accuracy in technical analysis interpretation
- No contradictory data inversions
- Properly handles MACD crossovers and ADX strength comparisons
- Maintains internal consistency across different indicators
mistral (1.2)
- Strong performance in ADX and MACD interpretation
- Minor confusion with ATR references
- Generally reliable for trend confirmation
- Good balance of accuracy and speed
llama3_2 (1.05)
- Solid overall structure in analysis
- Minor issues with RSI threshold interpretation
- Good pattern recognition capabilities
- Reliable for basic trend analysis
falcon3 (1.0)
- Chosen as the baseline model
- Average performance across indicators
- Consistent but not exceptional
- Good foundation for comparison
Lower-weighted Models (0.9-0.95)
- More frequent factual errors
- Issues with indicator value inversions
- Confusion with technical thresholds
- Still valuable for consensus building

Implementation Details

The weights are applied in our confidence calculation:

private calculateOverallConfidence(state: GistState): number {
  if (state.comparisons.length === 0) return 0;

  const total = state.comparisons.reduce((sum, comparison) => {
    // Apply model weight
    const weight = state.config.modelWeights[comparison.modelUsed] || 1.0;

    // Position-based boost for critical rankings
    const positionBoost = this.isTopMarketComparison(comparison, state) ? 1.1 : 1.0;

    return sum + (comparison.confidence * weight * positionBoost);
  }, 0);

  return total / state.comparisons.length;
}

Key Features

Weighted Confidence
- Each model's output is weighted by its reliability factor
- Critical rankings receive a 10% boost
- Simple average maintains stability
Fallback Mechanism
- Default weight of 1.0 for unknown models
- Graceful handling of new or experimental models
Position-Based Boosting
- Extra weight for top market comparisons
- Ensures accuracy where it matters most

Results and Impact

The weighted system has shown significant improvements:

Accuracy
- 15% reduction in ranking reversals
- More stable confidence scores
- Better alignment with manual analysis
Efficiency
- Faster convergence to final rankings
- Reduced need for comparison repetition
- More effective use of advanced models
Resource Optimization
- Better allocation of model usage
- Reduced overall API calls
- Improved cost-effectiveness

Future Improvements

Dynamic Weight Adjustment
- Implement performance-based weight updates
- Track success rates over time
- Adapt to changing market conditions
Enhanced Validation
- Add cross-timeframe validation
- Implement confidence correlation tracking
- Develop automated accuracy metrics
Model Specialization
- Identify model strengths per indicator
- Create indicator-specific weights
- Optimize model selection per analysis type

Conclusion

This data-driven approach to model weighting has significantly improved our market analysis system. By carefully considering each model's strengths and weaknesses, we've created a more robust and accurate ranking system that makes optimal use of our available resources.

Introduction​

Model Performance Analysis​

Detailed Findings​

Rationale​

Implementation Details​

Key Features​

Results and Impact​

Future Improvements​

Conclusion​