Skip to main content

LLM Bullshit Detection: Tools and Approaches for Technical Analysis

· 3 min read
Max Kaido
Architect

Following our observations about LLM behavior in technical analysis, a critical question emerges: How do we systematically detect when an LLM is confidently wrong or producing nonsensical analysis? This post explores the tools and approaches available for this challenge.

Available Tools and Approaches

1. Automated Validation Tools

1.1 Internal Consistency Checks

  • Ranking System Invariants
    • Market order should not affect final ranking
    • Shuffling markets should produce consistent results
    • Transitive relationships must hold (if A > B and B > C, then A > C)
  • Numerical Validation
    • RSI thresholds (30/70) are well-defined and can be programmatically checked
    • Volume comparisons should maintain mathematical consistency
    • Price relationships must follow basic arithmetic rules

1.2 Cross-Model Validation

  • Open Source Models
    • Different architectures might have different failure modes
    • Can use smaller, specialized models for specific validations
    • Cost-effective for initial screening
  • Commercial APIs
    • OpenAI/Anthropic as high-precision validators
    • Use sparingly due to cost constraints
    • Reserve for critical validations or tie-breaking

1.3 Time-Series Validation

  • Historical Consistency
    • Previous analyses of same market should not wildly fluctuate
    • Trend changes should correlate with significant market events
    • Volume profile interpretations should be temporally consistent
  • Technical Indicator Math
    • Automated validation of indicator calculations
    • Cross-checking indicator relationships
    • Detecting mathematically impossible claims

2. Systematic Validation Approaches

2.1 Known-Answer Testing

  • Golden Dataset
    • Curated set of clear-cut technical analysis cases
    • Well-documented market conditions with expert analysis
    • Regular validation against these known cases
  • Edge Cases
    • Extreme market conditions
    • Corner cases in indicator values
    • Unusual volume profiles

2.2 Probabilistic Validation

  • Ensemble Methods
    • Multiple models analyzing same data
    • Confidence weighted by model reliability
    • Disagreement as a signal for deeper investigation
  • Statistical Bounds
    • Expected ranges for various metrics
    • Probability thresholds for extreme claims
    • Time-series based probability checks

2.3 Domain-Specific Rules

  • Technical Analysis Laws
    • RSI bounds and interpretation rules
    • Volume-price relationships
    • Trend definition requirements
  • Market Mechanics
    • Liquidity implications
    • Order book physics
    • Trading hour effects

3. Human-in-the-Loop Tools

(Used sparingly and strategically)

3.1 Expert Review Triggers

  • Anomaly Detection
    • Unusual pattern combinations
    • Unexpected confidence scores
    • Novel market behavior
  • Strategic Sampling
    • Regular audit of high-impact decisions
    • Review of edge cases for learning
    • Validation of new patterns

3.2 Feedback Loops

  • Error Cataloging
    • Systematic recording of detected errors
    • Pattern recognition in failure modes
    • Continuous refinement of detection rules
  • Model Retraining Signals
    • Identifying systematic errors
    • Collecting correction examples
    • Prioritizing improvement areas

4. Meta-Validation Tools

4.1 Process Validation

  • Decision Trees
    • Clear validation pathways
    • Documented decision points
    • Failure mode handling
  • Audit Trails
    • Complete reasoning chains
    • Model confidence tracking
    • Validation step logging

4.2 System Health Metrics

  • Validation Coverage
    • Percentage of decisions validated
    • Types of checks applied
    • Validation depth metrics
  • Error Rates
    • False positive tracking
    • Miss rate monitoring
    • Confidence correlation

Questions to Explore

  1. Tool Integration

    • How do we efficiently combine these tools?
    • What's the optimal validation sequence?
    • How do we handle tool conflicts?
  2. Resource Optimization

    • When to use expensive vs cheap validation?
    • How to minimize API costs?
    • What's the minimal effective validation set?
  3. Scalability Concerns

    • How does validation time affect real-time analysis?
    • Can we parallelize validation effectively?
    • What's the maintenance overhead?

Remember: This is an initial exploration of available tools. The next post will focus on how to combine these tools into an effective bullshit detection system.