Skip to main content

DeepSeek R1 Experiment Results

· 2 min read
Max Kaido
Architect

Your experimental results provide valuable insights into how to effectively work with DeepSeek R1. Let me summarize the key findings and implications:

Key Findings

  1. Token Prediction Parameter (num_predict):

    • Decreasing num_predict made the issue worse
    • Increasing to 1024 fixed the truncation problem
    • Not providing the parameter (defaulting to unlimited) also works well
    • This suggests the model needs sufficient space to complete its reasoning process
  2. Model Knowledge vs. Prompt Instructions:

    • The model sometimes concludes differently than expected due to its pre-trained knowledge about RSI thresholds
    • It may not use explicit "yes"/"no" terminology even when reaching the correct conclusion
    • This indicates potential conflicts between the model's training data and our specific instructions

Implications for Implementation

  1. Parameter Settings:

    • For reasoning models like DeepSeek R1, avoid restricting num_predict unless necessary for resource optimization
    • When setting num_predict, ensure it's large enough to allow for complete reasoning (including closing tags)
    • Consider implementing a check for the </think> closing tag before processing the response
  2. Prompt Engineering:

    • Explicitly emphasize that the model should use our definitions rather than its pre-trained knowledge
    • Include clear instructions about the expected response format
    • Consider adding examples that demonstrate how to handle cases that might conflict with its training
  3. Response Validation:

    • Expand our validation logic to recognize conclusions beyond just "yes"/"no" (e.g., "oversold", "the market is oversold")
    • Implement more sophisticated parsing to extract the model's actual conclusion

Next Steps

  1. Prompt Refinement:

    • Create a system prompt that explicitly instructs the model to:
      • Use only the definitions provided in the prompt
      • Complete its thinking process with a </think> tag
      • Provide an explicit yes/no answer using those exact terms
  2. Robust Testing Framework:

    • Implement repeated testing (N times per test case) to assess reliability
    • Test with various parameter combinations (temperature, top_k, top_p, min_p)
    • Document success rates for different configurations
  3. Response Processing:

    • Develop more sophisticated response parsing that can:
      • Verify the presence of closing tags
      • Extract conclusions even when they don't use explicit yes/no terminology
      • Handle cases where the model's reasoning is correct but its format doesn't match expectations

Would you like me to draft an updated test implementation that incorporates these findings? I can focus on creating a more robust system prompt and validation logic based on what we've learned.