LLM Bullshit Detection: Tools and Approaches for Technical Analysis

February 17, 2025 · 3 min read

Architect

Following our observations about LLM behavior in technical analysis, a critical question emerges: How do we systematically detect when an LLM is confidently wrong or producing nonsensical analysis? This post explores the tools and approaches available for this challenge.

Available Tools and Approaches

1. Automated Validation Tools

1.1 Internal Consistency Checks

Ranking System Invariants
- Market order should not affect final ranking
- Shuffling markets should produce consistent results
- Transitive relationships must hold (if A > B and B > C, then A > C)
Numerical Validation
- RSI thresholds (30/70) are well-defined and can be programmatically checked
- Volume comparisons should maintain mathematical consistency
- Price relationships must follow basic arithmetic rules

1.2 Cross-Model Validation

Open Source Models
- Different architectures might have different failure modes
- Can use smaller, specialized models for specific validations
- Cost-effective for initial screening
Commercial APIs
- OpenAI/Anthropic as high-precision validators
- Use sparingly due to cost constraints
- Reserve for critical validations or tie-breaking

1.3 Time-Series Validation

Historical Consistency
- Previous analyses of same market should not wildly fluctuate
- Trend changes should correlate with significant market events
- Volume profile interpretations should be temporally consistent
Technical Indicator Math
- Automated validation of indicator calculations
- Cross-checking indicator relationships
- Detecting mathematically impossible claims

2. Systematic Validation Approaches

2.1 Known-Answer Testing

Golden Dataset
- Curated set of clear-cut technical analysis cases
- Well-documented market conditions with expert analysis
- Regular validation against these known cases
Edge Cases
- Extreme market conditions
- Corner cases in indicator values
- Unusual volume profiles

2.2 Probabilistic Validation

Ensemble Methods
- Multiple models analyzing same data
- Confidence weighted by model reliability
- Disagreement as a signal for deeper investigation
Statistical Bounds
- Expected ranges for various metrics
- Probability thresholds for extreme claims
- Time-series based probability checks

2.3 Domain-Specific Rules

Technical Analysis Laws
- RSI bounds and interpretation rules
- Volume-price relationships
- Trend definition requirements
Market Mechanics
- Liquidity implications
- Order book physics
- Trading hour effects

3. Human-in-the-Loop Tools

(Used sparingly and strategically)

3.1 Expert Review Triggers

Anomaly Detection
- Unusual pattern combinations
- Unexpected confidence scores
- Novel market behavior
Strategic Sampling
- Regular audit of high-impact decisions
- Review of edge cases for learning
- Validation of new patterns

3.2 Feedback Loops

Error Cataloging
- Systematic recording of detected errors
- Pattern recognition in failure modes
- Continuous refinement of detection rules
Model Retraining Signals
- Identifying systematic errors
- Collecting correction examples
- Prioritizing improvement areas

4. Meta-Validation Tools

4.1 Process Validation

Decision Trees
- Clear validation pathways
- Documented decision points
- Failure mode handling
Audit Trails
- Complete reasoning chains
- Model confidence tracking
- Validation step logging

4.2 System Health Metrics

Validation Coverage
- Percentage of decisions validated
- Types of checks applied
- Validation depth metrics
Error Rates
- False positive tracking
- Miss rate monitoring
- Confidence correlation

Questions to Explore

Tool Integration
- How do we efficiently combine these tools?
- What's the optimal validation sequence?
- How do we handle tool conflicts?
Resource Optimization
- When to use expensive vs cheap validation?
- How to minimize API costs?
- What's the minimal effective validation set?
Scalability Concerns
- How does validation time affect real-time analysis?
- Can we parallelize validation effectively?
- What's the maintenance overhead?

Remember: This is an initial exploration of available tools. The next post will focus on how to combine these tools into an effective bullshit detection system.

Available Tools and Approaches​

1. Automated Validation Tools​

1.1 Internal Consistency Checks​

1.2 Cross-Model Validation​

1.3 Time-Series Validation​

2. Systematic Validation Approaches​

2.1 Known-Answer Testing​

2.2 Probabilistic Validation​

2.3 Domain-Specific Rules​

3. Human-in-the-Loop Tools​

3.1 Expert Review Triggers​

3.2 Feedback Loops​

4. Meta-Validation Tools​

4.1 Process Validation​

4.2 System Health Metrics​

Questions to Explore​