DeepSeek R1 Experiment Results

March 12, 2025 · 2 min read

Architect

Your experimental results provide valuable insights into how to effectively work with DeepSeek R1. Let me summarize the key findings and implications:

Key Findings

Token Prediction Parameter (num_predict):
- Decreasing num_predict made the issue worse
- Increasing to 1024 fixed the truncation problem
- Not providing the parameter (defaulting to unlimited) also works well
- This suggests the model needs sufficient space to complete its reasoning process
Model Knowledge vs. Prompt Instructions:
- The model sometimes concludes differently than expected due to its pre-trained knowledge about RSI thresholds
- It may not use explicit "yes"/"no" terminology even when reaching the correct conclusion
- This indicates potential conflicts between the model's training data and our specific instructions

Implications for Implementation

Parameter Settings:
- For reasoning models like DeepSeek R1, avoid restricting num_predict unless necessary for resource optimization
- When setting num_predict, ensure it's large enough to allow for complete reasoning (including closing tags)
- Consider implementing a check for the </think> closing tag before processing the response
Prompt Engineering:
- Explicitly emphasize that the model should use our definitions rather than its pre-trained knowledge
- Include clear instructions about the expected response format
- Consider adding examples that demonstrate how to handle cases that might conflict with its training
Response Validation:
- Expand our validation logic to recognize conclusions beyond just "yes"/"no" (e.g., "oversold", "the market is oversold")
- Implement more sophisticated parsing to extract the model's actual conclusion

Next Steps

Prompt Refinement:
- Create a system prompt that explicitly instructs the model to:
  - Use only the definitions provided in the prompt
  - Complete its thinking process with a </think> tag
  - Provide an explicit yes/no answer using those exact terms
Robust Testing Framework:
- Implement repeated testing (N times per test case) to assess reliability
- Test with various parameter combinations (temperature, top_k, top_p, min_p)
- Document success rates for different configurations
Response Processing:
- Develop more sophisticated response parsing that can:
  - Verify the presence of closing tags
  - Extract conclusions even when they don't use explicit yes/no terminology
  - Handle cases where the model's reasoning is correct but its format doesn't match expectations

Would you like me to draft an updated test implementation that incorporates these findings? I can focus on creating a more robust system prompt and validation logic based on what we've learned.

Key Findings​

Implications for Implementation​

Next Steps​

Key Findings

Implications for Implementation

Next Steps