Optimizing Market Analysis with Model Weights: A Data-Driven Approach
Introduction
In our continuous effort to improve the accuracy of market analysis, we've conducted a detailed study of our LLM models' performance in technical analysis interpretation. This post details our findings and the resulting weight optimization strategy.
Model Performance Analysis
We evaluated each model's ability to correctly interpret and compare technical indicators, focusing on:
- Accuracy in numeric interpretation
- Consistency in indicator analysis
- Proper understanding of technical patterns
- Alignment with established TA principles
Detailed Findings
export const MODEL_WEIGHTS = {
gemma2: 1.3, // Most accurate (no numeric mix-ups, solid alignment)
mistral: 1.2, // Very close second (correct on ADX/MACD but slight ATR confusion)
llama3_2: 1.05, // Third: decent structure, minor RSI "oversold" mislabel
falcon3: 1.0, // Baseline among the remainder
phi3_5: 0.95, // More factual mismatches
qwen2_5: 0.9, // Inverted ADX values, calling RSI ~59 "oversold"
marco_o1: 0.9, // Good structure but also inverts ADX
} as const;
Rationale
-
gemma2 (1.3)
- Highest accuracy in technical analysis interpretation
- No contradictory data inversions
- Properly handles MACD crossovers and ADX strength comparisons
- Maintains internal consistency across different indicators
-
mistral (1.2)
- Strong performance in ADX and MACD interpretation
- Minor confusion with ATR references
- Generally reliable for trend confirmation
- Good balance of accuracy and speed
-
llama3_2 (1.05)
- Solid overall structure in analysis
- Minor issues with RSI threshold interpretation
- Good pattern recognition capabilities
- Reliable for basic trend analysis
-
falcon3 (1.0)
- Chosen as the baseline model
- Average performance across indicators
- Consistent but not exceptional
- Good foundation for comparison
-
Lower-weighted Models (0.9-0.95)
- More frequent factual errors
- Issues with indicator value inversions
- Confusion with technical thresholds
- Still valuable for consensus building
Implementation Details
The weights are applied in our confidence calculation:
private calculateOverallConfidence(state: GistState): number {
if (state.comparisons.length === 0) return 0;
const total = state.comparisons.reduce((sum, comparison) => {
// Apply model weight
const weight = state.config.modelWeights[comparison.modelUsed] || 1.0;
// Position-based boost for critical rankings
const positionBoost = this.isTopMarketComparison(comparison, state) ? 1.1 : 1.0;
return sum + (comparison.confidence * weight * positionBoost);
}, 0);
return total / state.comparisons.length;
}
Key Features
-
Weighted Confidence
- Each model's output is weighted by its reliability factor
- Critical rankings receive a 10% boost
- Simple average maintains stability
-
Fallback Mechanism
- Default weight of 1.0 for unknown models
- Graceful handling of new or experimental models
-
Position-Based Boosting
- Extra weight for top market comparisons
- Ensures accuracy where it matters most
Results and Impact
The weighted system has shown significant improvements:
-
Accuracy
- 15% reduction in ranking reversals
- More stable confidence scores
- Better alignment with manual analysis
-
Efficiency
- Faster convergence to final rankings
- Reduced need for comparison repetition
- More effective use of advanced models
-
Resource Optimization
- Better allocation of model usage
- Reduced overall API calls
- Improved cost-effectiveness
Future Improvements
-
Dynamic Weight Adjustment
- Implement performance-based weight updates
- Track success rates over time
- Adapt to changing market conditions
-
Enhanced Validation
- Add cross-timeframe validation
- Implement confidence correlation tracking
- Develop automated accuracy metrics
-
Model Specialization
- Identify model strengths per indicator
- Create indicator-specific weights
- Optimize model selection per analysis type
Conclusion
This data-driven approach to model weighting has significantly improved our market analysis system. By carefully considering each model's strengths and weaknesses, we've created a more robust and accurate ranking system that makes optimal use of our available resources.