BS Detection Module: Evaluation Criteria

February 17, 2025 · 3 min read

Architect

When evaluating approaches for bullshit detection in LLM outputs, we need clear criteria that prioritize practical value over theoretical elegance. This post outlines key criteria for evaluating different approaches.

Core Evaluation Principles

1. Results Over Reasoning

Money Talks
- Perfect reasoning with poor returns = Failure
- "Wrong" reasoning with consistent profits = Success
- Correlation with profit is the ultimate metric
- Beautiful explanations don't pay bills
Practical Implications
- Don't waste resources perfecting explanations that don't impact returns
- Focus validation on decision quality, not reasoning quality
- Use profit as the north star for all improvements
- Consider reasoning quality only if it affects final decisions

2. Fundamental Over Configurable

Signal-Agnostic Validation
- BS detection should not depend on specific indicator thresholds
- "RSI > 70 vs RSI > 60" is not a BS detection problem
- Signal generation logic belongs in trading strategy, not validation
- Focus on universal properties, not specific interpretations
Binary Over Spectrum
- Validation should trend towards yes/no decisions
- "Is this mathematically possible?" over "Is this a good signal?"
- Reduce dependency on configurable thresholds
- Clear boundaries between validation and strategy
Why This Matters
- Keeps BS detection module focused and maintainable
- Prevents confusion between validation and signal generation
- Makes validation more robust across different strategies
- Reduces risk of over-fitting validation to current market conditions

3. Automation Over Human Intervention

Distance-Ready Systems
- System should work without human babysitting
- Validation should be fundamental, not point-in-time
- Approaches should scale across time and market conditions
- Minimize dependency on human fine-tuning
Value of Invariants
- Mathematical properties that always hold
- Market physics that can't be violated
- Relationships that persist across market conditions
- Rules that don't need human verification

4. Efficiency Over Completeness

Resource Allocation
- Better to catch 80% of BS with 20% of resources
- Focus on high-impact validation first
- Avoid diminishing returns in validation depth
- Keep validation overhead minimal
Speed Considerations
- Fast incorrect validation > Slow perfect validation
- Validation should not become a bottleneck
- Quick feedback loops enable faster iteration
- Time is a critical resource in trading

5. Robustness Over Sophistication

Simple > Complex
- Fewer moving parts = Fewer failure points
- Simple rules are easier to verify
- Complex validation can introduce its own bugs
- Maintenance cost matters
Failure Modes
- Clear failure states are better than ambiguous ones
- System should fail predictably
- Recovery should be automatic where possible
- Failure detection should be straightforward

6. Evolution Over Revolution

Incremental Improvement
- Start with basic validations that work
- Add complexity only when proven necessary
- Keep working parts while improving others
- Build on what generates value
Learning System
- System should get better with more data
- Failures should inform improvements
- Adaptation should be mostly automatic
- Human insights should become system features

Questions for Evaluation

Value Focus
- Does this validation directly impact profit?
- What's the cost/benefit ratio?
- Can we measure its impact on returns?
- Is it solving a real problem?
Automation Potential
- Can this run without human oversight?
- Are the rules fundamentally sound?
- How often will it need updating?
- What are the maintenance requirements?
Resource Usage
- What's the computational cost?
- How does it scale with market complexity?
- What's the development overhead?
- Is the validation cost justified?
Reliability Assessment
- How does it fail?
- What are the edge cases?
- Can it recover automatically?
- How do we monitor its health?

Red Flags in Approaches

Resource Traps
- Heavy reliance on human review
- Complex validation chains
- High computational overhead
- Frequent manual updates needed
False Sophistication
- Over-engineered solutions
- Unnecessary precision
- Too many configuration options
- Complex failure modes
Missing Fundamentals
- No clear connection to profit
- Can't run autonomously
- Requires constant tuning
- Not based on invariants

Remember: The goal is not to build a perfect BS detector, but one that effectively contributes to profitable trading decisions while maintaining operational efficiency.

Core Evaluation Principles​

1. Results Over Reasoning​

2. Fundamental Over Configurable​

3. Automation Over Human Intervention​

4. Efficiency Over Completeness​

5. Robustness Over Sophistication​

6. Evolution Over Revolution​

Questions for Evaluation​

Red Flags in Approaches​