Skip to main content

BS Detection Phase 1: Systematic Sampling with Minimal Human Oversight

· 3 min read
Max Kaido
Architect

The key part is human in the loop. And that human needs very minimal involvement but effective. What I suggest: let's start other way around from that human in the loop.

  1. Artifacts already part of our system. You can find them in ArtifactModule.
  2. Every day in Mercury channel MAAT creates artifact (TG msg + gist). TG message will be entry point for human and notification (once per day is bearable :D).
  3. Key info obviously in gist.
  4. Inside gist there are several files. In one file we put BS candidates from specific task (actually method).
  5. Each file should be crafted in a way so if human just copy content he can send it right away to some LLM with huge context and allow it to do heavylifting then return response with insights.
  6. File structure approx: {task specific prompt} + [BS candidates].
  7. How we collect BS candidates? Those are collected from individual methods that can produce BS. Each method we assign some probability of BS sampling based on approx freq of method invocation per day.
  8. MAAT implements method bsCollector => (gistFileName, samplingProbabilty) => logic to store BS candidate.
  9. Once 1-7 implemented which is basic storage of BS candidates and human notification system, we go to specific methods that known to be BS source and add to those specific method optional extra param for glorious bsCollector.
  10. And those methods we slightly update adding few lines of code in a place where BS candidate may appear.
  11. That lines of code would with defined probability and in presence of bsCollector would do sampling. Basically that logic will prepare one string entity that we can include into BS candidates in specific file.

Core Design

MAAT, our validation system, will implement a simple yet effective approach:

  1. Daily Artifact Generation: MAAT generates a daily Telegram message + GitHub Gist containing potential BS candidates
  2. Method-specific Sampling: BS candidates are collected from specific methods with known BS potential
  3. LLM-ready Format: Content is pre-formatted for direct analysis by large context LLMs
  4. Probabilistic Collection: Sampling occurs with configurable probability based on method frequency

Implementation Details

1. BS Collector Function

The core of our implementation is a simple collector function:

function bsCollector(gistFileName: string, samplingProbability: number) {
return (candidateData: any) => {
if (Math.random() > samplingProbability) return; // Skip based on probability

// Store the BS candidate for inclusion in the specified gist file
storeBSCandidate(gistFileName, candidateData);
};
}

2. Integration Points

We'll integrate the BS collector at specific points in our codebase:

// Example integration in a method known to produce BS
async function transformMarketData(data: MarketData, options: Options) {
// Optional collector injection
const collector = options.bsCollector || null;

const result = performTransformation(data);

// Sample with configurable probability if collector is provided
if (collector) {
collector({
input: data,
output: result,
context: { methodName: 'transformMarketData', timestamp: new Date() },
});
}

return result;
}

3. Daily Artifact Creation

Once per day, MAAT compiles all collected BS candidates:

  1. Creates a GitHub Gist with separate files for each method
  2. Each file contains:
    • Method-specific prompt for LLM analysis
    • Collected BS candidates in a structured format
  3. Posts a Telegram message to the Mercury channel with a link to the Gist

4. Human Review Process

  1. Human reviews the daily Telegram notification
  2. Copies content from relevant Gist files
  3. Pastes directly into an LLM with large context window
  4. Reviews LLM analysis to identify actual BS
  5. Takes action based on findings

Benefits

  1. Low Engineering Overhead: Simple implementation with minimal impact on existing code
  2. Efficient Human Time: One daily review covers multiple potential issues
  3. Wide Coverage: Samples across many methods and execution paths
  4. Scalable: Easy to add new sampling points as needed
  5. LLM-Optimized: Leverages large context models for heavy lifting

Initial Implementation Focus

  1. Implement core bsCollector function
  2. Create daily Gist + Telegram message generation
  3. Add sampling to 3-5 high-priority methods known to produce BS
  4. Define method-specific LLM prompts for effective analysis

Next Steps

After collecting initial data, we'll:

  1. Refine sampling probabilities based on BS frequency
  2. Expand coverage to additional methods
  3. Improve formatting based on LLM analysis feedback

This Phase 1 approach provides immediate value while building toward more sophisticated automated validation in future phases.