DeepSeek R1 Experiment Results
· 2 min read
Your experimental results provide valuable insights into how to effectively work with DeepSeek R1. Let me summarize the key findings and implications:
Key Findings
-
Token Prediction Parameter (
num_predict):- Decreasing
num_predictmade the issue worse - Increasing to 1024 fixed the truncation problem
- Not providing the parameter (defaulting to unlimited) also works well
- This suggests the model needs sufficient space to complete its reasoning process
- Decreasing
-
Model Knowledge vs. Prompt Instructions:
- The model sometimes concludes differently than expected due to its pre-trained knowledge about RSI thresholds
- It may not use explicit "yes"/"no" terminology even when reaching the correct conclusion
- This indicates potential conflicts between the model's training data and our specific instructions
Implications for Implementation
-
Parameter Settings:
- For reasoning models like DeepSeek R1, avoid restricting
num_predictunless necessary for resource optimization - When setting
num_predict, ensure it's large enough to allow for complete reasoning (including closing tags) - Consider implementing a check for the
</think>closing tag before processing the response
- For reasoning models like DeepSeek R1, avoid restricting
-
Prompt Engineering:
- Explicitly emphasize that the model should use our definitions rather than its pre-trained knowledge
- Include clear instructions about the expected response format
- Consider adding examples that demonstrate how to handle cases that might conflict with its training
-
Response Validation:
- Expand our validation logic to recognize conclusions beyond just "yes"/"no" (e.g., "oversold", "the market is oversold")
- Implement more sophisticated parsing to extract the model's actual conclusion
Next Steps
-
Prompt Refinement:
- Create a system prompt that explicitly instructs the model to:
- Use only the definitions provided in the prompt
- Complete its thinking process with a
</think>tag - Provide an explicit yes/no answer using those exact terms
- Create a system prompt that explicitly instructs the model to:
-
Robust Testing Framework:
- Implement repeated testing (N times per test case) to assess reliability
- Test with various parameter combinations (temperature, top_k, top_p, min_p)
- Document success rates for different configurations
-
Response Processing:
- Develop more sophisticated response parsing that can:
- Verify the presence of closing tags
- Extract conclusions even when they don't use explicit yes/no terminology
- Handle cases where the model's reasoning is correct but its format doesn't match expectations
- Develop more sophisticated response parsing that can:
Would you like me to draft an updated test implementation that incorporates these findings? I can focus on creating a more robust system prompt and validation logic based on what we've learned.
