BS Detection Module: Evaluation Criteria
· 3 min read
When evaluating approaches for bullshit detection in LLM outputs, we need clear criteria that prioritize practical value over theoretical elegance. This post outlines key criteria for evaluating different approaches.
Core Evaluation Principles
1. Results Over Reasoning
-
Money Talks
- Perfect reasoning with poor returns = Failure
- "Wrong" reasoning with consistent profits = Success
- Correlation with profit is the ultimate metric
- Beautiful explanations don't pay bills
-
Practical Implications
- Don't waste resources perfecting explanations that don't impact returns
- Focus validation on decision quality, not reasoning quality
- Use profit as the north star for all improvements
- Consider reasoning quality only if it affects final decisions
2. Fundamental Over Configurable
-
Signal-Agnostic Validation
- BS detection should not depend on specific indicator thresholds
- "RSI > 70 vs RSI > 60" is not a BS detection problem
- Signal generation logic belongs in trading strategy, not validation
- Focus on universal properties, not specific interpretations
-
Binary Over Spectrum
- Validation should trend towards yes/no decisions
- "Is this mathematically possible?" over "Is this a good signal?"
- Reduce dependency on configurable thresholds
- Clear boundaries between validation and strategy
-
Why This Matters
- Keeps BS detection module focused and maintainable
- Prevents confusion between validation and signal generation
- Makes validation more robust across different strategies
- Reduces risk of over-fitting validation to current market conditions
3. Automation Over Human Intervention
-
Distance-Ready Systems
- System should work without human babysitting
- Validation should be fundamental, not point-in-time
- Approaches should scale across time and market conditions
- Minimize dependency on human fine-tuning
-
Value of Invariants
- Mathematical properties that always hold
- Market physics that can't be violated
- Relationships that persist across market conditions
- Rules that don't need human verification
4. Efficiency Over Completeness
-
Resource Allocation
- Better to catch 80% of BS with 20% of resources
- Focus on high-impact validation first
- Avoid diminishing returns in validation depth
- Keep validation overhead minimal
-
Speed Considerations
- Fast incorrect validation > Slow perfect validation
- Validation should not become a bottleneck
- Quick feedback loops enable faster iteration
- Time is a critical resource in trading
5. Robustness Over Sophistication
-
Simple > Complex
- Fewer moving parts = Fewer failure points
- Simple rules are easier to verify
- Complex validation can introduce its own bugs
- Maintenance cost matters
-
Failure Modes
- Clear failure states are better than ambiguous ones
- System should fail predictably
- Recovery should be automatic where possible
- Failure detection should be straightforward
6. Evolution Over Revolution
-
Incremental Improvement
- Start with basic validations that work
- Add complexity only when proven necessary
- Keep working parts while improving others
- Build on what generates value
-
Learning System
- System should get better with more data
- Failures should inform improvements
- Adaptation should be mostly automatic
- Human insights should become system features
Questions for Evaluation
-
Value Focus
- Does this validation directly impact profit?
- What's the cost/benefit ratio?
- Can we measure its impact on returns?
- Is it solving a real problem?
-
Automation Potential
- Can this run without human oversight?
- Are the rules fundamentally sound?
- How often will it need updating?
- What are the maintenance requirements?
-
Resource Usage
- What's the computational cost?
- How does it scale with market complexity?
- What's the development overhead?
- Is the validation cost justified?
-
Reliability Assessment
- How does it fail?
- What are the edge cases?
- Can it recover automatically?
- How do we monitor its health?
Red Flags in Approaches
-
Resource Traps
- Heavy reliance on human review
- Complex validation chains
- High computational overhead
- Frequent manual updates needed
-
False Sophistication
- Over-engineered solutions
- Unnecessary precision
- Too many configuration options
- Complex failure modes
-
Missing Fundamentals
- No clear connection to profit
- Can't run autonomously
- Requires constant tuning
- Not based on invariants
Remember: The goal is not to build a perfect BS detector, but one that effectively contributes to profitable trading decisions while maintaining operational efficiency.
