DeepSeek R1: Numerical Reasoning Capabilities for Market Analysis
DeepSeek R1 1.5B has shown impressive capabilities for numerical reasoning in financial market analysis. Our experiments revealed several key insights that will significantly improve our market comparison service, particularly for tournament scenarios where reliable numerical analysis is critical.
Key Discoveries
1. Thinking Process and Token Management
DeepSeek R1 uses <think>...</think> tags to structure its reasoning process, which provides valuable transparency into how it reaches conclusions. However, we discovered several important aspects of this mechanism:
-
Token Prediction Parameter: Setting
num_predicttoo low can lead to truncated thinking, with the model failing to close its</think>tag. This creates problems for downstream processing. -
Optimal Configuration: Always set
num_predict: -1(unlimited) explicitly. While other values like 1024+ might work, using the explicit unlimited setting prevents future regressions when defaults are changed or "optimized" in future iterations. -
Explicit Over Implicit: Never rely on default values for critical parameters like
num_predict. Even if the current default is -1, explicitly setting it ensures consistent behavior across different environments and future versions.
// ALWAYS use this explicit configuration for reasoning tasks
const response = await ollamaService.generateRawText(prompt, Model.R1, {
temperature: 0.1,
num_predict: -1, // Explicitly unlimited tokens - never rely on defaults
});
2. Domain Knowledge Integration
One of the most fascinating discoveries is how DeepSeek R1 integrates its pre-trained knowledge with the task at hand:
-
Financial Domain Knowledge: The model has strong prior knowledge about financial indicators like RSI, including standard thresholds (e.g., RSI < 30 for oversold conditions).
-
Knowledge Override: When our prompts specified different thresholds than the standard ones, the model sometimes prioritized its pre-trained knowledge, leading to different conclusions than expected.
-
Explicit Instructions: To ensure the model follows specific thresholds that differ from standard ones, explicit instructions in the system prompt are effective:
const systemPrompt =
'Follow this context, not standard definitions. If RSI is below 30, the market is extremely oversold. If RSI is between 30 and 50, the market is neutral. If RSI is above 70, the market is overbought. Based only on RSI value determine the market condition.';
3. Structured Output Handling
When combining reasoning with structured outputs (like JSON), we found important patterns:
-
Two-Phase Approach with Model Specialization: For reliable results, separate the reasoning phase from the structured output phase using different models for each task:
- First, use DeepSeek R1 for the reasoning phase with
generateRawText - Then, use a non-reasoning model like Llama 3.2 for the JSON parsing phase with
generateand a JSON schema
- First, use DeepSeek R1 for the reasoning phase with
-
Avoid Reasoning Models for Parsing: Using reasoning models like DeepSeek R1 for JSON parsing is inefficient and problematic:
- It's not their strength
- Ollama may hide the reasoning part that's still being generated
- This "magic" is both slow and can lead to unexpected behavior
- Lighter models like Llama 3.2 are more efficient for parsing natural language into structured data
-
Response Parsing: The Ollama service may return already-parsed objects when using JSON schema validation, requiring careful handling:
// Two-phase approach with specialized models
// Phase 1: Reasoning with DeepSeek R1
const reasoningResponse = await ollamaService.generateRawText(
prompt,
Model.R1,
{
temperature: 0.1,
num_predict: -1,
},
);
// Extract conclusion
const thinkCloseIndex = reasoningResponse.indexOf('</think>');
const conclusionText =
thinkCloseIndex !== -1
? reasoningResponse.substring(thinkCloseIndex + '</think>'.length).trim()
: reasoningResponse.trim();
// Phase 2: Structured parsing with Llama 3.2 (no extra prompt needed)
const jsonResponse = await ollamaService.generate(
`Parse this conclusion: ${conclusionText}`,
Model.LLAMA3_2_3B,
{
type: 'object',
properties: {
isOversold: { type: 'boolean' },
isUncertain: { type: 'boolean' }, // Flag for uncertainty detection
},
required: ['isOversold', 'isUncertain'],
},
{
temperature: 0.1,
},
);
4. Handling Uncertainty in Market Conditions
A critical insight for real-world applications is dealing with uncertainty in market conditions:
-
Embrace Uncertainty: In real-world scenarios, it's often objectively difficult to definitively determine if a market is overbought or oversold, regardless of prompt sophistication.
-
Two-Step Uncertainty Handling:
- Allow DeepSeek R1 to express uncertainty in its reasoning (let it think naturally)
- In the JSON parsing step, include an explicit
isUncertainproperty that the parsing model can set to true when it detects uncertainty in the conclusion
-
Separation of Concerns:
- Don't force the reasoning model to make definitive judgments in uncertain cases
- Don't add uncertainty handling to the reasoning step as it decreases quality and wastes tokens
- Use the parsing model to detect and flag uncertainty without forcing it to make the final decision
// JSON schema with uncertainty handling
const schema = {
type: 'object',
properties: {
marketCondition: {
type: 'string',
enum: ['oversold', 'neutral', 'overbought'],
},
isUncertain: {
type: 'boolean',
description: 'Set to true if the conclusion expresses uncertainty',
},
confidenceScore: { type: 'number', minimum: 0, maximum: 1 },
},
required: ['marketCondition', 'isUncertain'],
};
5. Conclusion Extraction
For evaluation or further processing, extracting just the conclusion after the thinking process improves reliability:
// Extract only the conclusion part (after </think> tag)
const thinkCloseIndex = reasoning.indexOf('</think>');
const conclusionText =
thinkCloseIndex !== -1
? reasoning.substring(thinkCloseIndex + '</think>'.length).trim()
: reasoning.trim();
Performance Insights
DeepSeek R1 1.5B demonstrated impressive numerical reasoning capabilities:
- Basic Comparisons: Reliably determines relationships between values (greater than, less than, equal to)
- Decimal Handling: Correctly processes and compares decimal values
- Threshold Evaluation: Accurately evaluates values against thresholds
- Multi-Condition Analysis: Successfully navigates multiple conditions and ranges
The model's performance was consistent and reliable when properly configured, with tests completing in reasonable time (typically 4-6 seconds per reasoning task).
Implications for Market Comparison Service
These findings have significant implications for our market comparison service:
-
Reliable Numerical Analysis: DeepSeek R1 can be trusted for numerical reasoning tasks in market analysis, providing we configure it correctly with explicit parameters.
-
Domain-Specific Instructions: When we need the model to use non-standard thresholds or criteria, explicit instructions in the system prompt are essential.
-
Two-Phase Processing with Specialized Models: Use DeepSeek R1 for reasoning and lighter models like Llama 3.2 for structured output parsing.
-
Explicit Token Management: Always explicitly set
num_predict: -1for reasoning tasks to ensure complete thinking processes. -
Response Validation: Always check for the presence of closing
</think>tags before processing responses. -
Uncertainty Handling: Include uncertainty detection in the JSON schema to handle ambiguous market conditions without forcing definitive judgments.
Conclusion
DeepSeek R1 1.5B is well-suited for numerical reasoning tasks in financial market analysis. Its ability to handle complex comparisons, integrate domain knowledge, and provide transparent reasoning makes it an excellent choice for our market comparison service.
By implementing the patterns and configurations discovered in our experiments, we can build a more reliable and effective market comparison system that leverages DeepSeek R1's strengths while mitigating potential issues.
These insights will directly inform the development of our MarketComparisonService, ensuring it provides accurate and reliable comparisons during tournaments.
