DeepSeek R1: Numerical Reasoning Capabilities for Market Analysis

March 12, 2025 · 4 min read

Architect

DeepSeek R1 1.5B has shown impressive capabilities for numerical reasoning in financial market analysis. Our experiments revealed several key insights that will significantly improve our market comparison service, particularly for tournament scenarios where reliable numerical analysis is critical.

Key Discoveries

1. Thinking Process and Token Management

DeepSeek R1 uses <think>...</think> tags to structure its reasoning process, which provides valuable transparency into how it reaches conclusions. However, we discovered several important aspects of this mechanism:

Token Prediction Parameter: Setting num_predict too low can lead to truncated thinking, with the model failing to close its </think> tag. This creates problems for downstream processing.
Optimal Configuration: Always set num_predict: -1 (unlimited) explicitly. While other values like 1024+ might work, using the explicit unlimited setting prevents future regressions when defaults are changed or "optimized" in future iterations.
Explicit Over Implicit: Never rely on default values for critical parameters like num_predict. Even if the current default is -1, explicitly setting it ensures consistent behavior across different environments and future versions.

// ALWAYS use this explicit configuration for reasoning tasks
const response = await ollamaService.generateRawText(prompt, Model.R1, {
  temperature: 0.1,
  num_predict: -1, // Explicitly unlimited tokens - never rely on defaults
});

2. Domain Knowledge Integration

One of the most fascinating discoveries is how DeepSeek R1 integrates its pre-trained knowledge with the task at hand:

Financial Domain Knowledge: The model has strong prior knowledge about financial indicators like RSI, including standard thresholds (e.g., RSI < 30 for oversold conditions).
Knowledge Override: When our prompts specified different thresholds than the standard ones, the model sometimes prioritized its pre-trained knowledge, leading to different conclusions than expected.
Explicit Instructions: To ensure the model follows specific thresholds that differ from standard ones, explicit instructions in the system prompt are effective:

const systemPrompt =
  'Follow this context, not standard definitions. If RSI is below 30, the market is extremely oversold. If RSI is between 30 and 50, the market is neutral. If RSI is above 70, the market is overbought. Based only on RSI value determine the market condition.';

3. Structured Output Handling

When combining reasoning with structured outputs (like JSON), we found important patterns:

Two-Phase Approach with Model Specialization: For reliable results, separate the reasoning phase from the structured output phase using different models for each task:
1. First, use DeepSeek R1 for the reasoning phase with generateRawText
2. Then, use a non-reasoning model like Llama 3.2 for the JSON parsing phase with generate and a JSON schema
Avoid Reasoning Models for Parsing: Using reasoning models like DeepSeek R1 for JSON parsing is inefficient and problematic:
- It's not their strength
- Ollama may hide the reasoning part that's still being generated
- This "magic" is both slow and can lead to unexpected behavior
- Lighter models like Llama 3.2 are more efficient for parsing natural language into structured data
Response Parsing: The Ollama service may return already-parsed objects when using JSON schema validation, requiring careful handling:

// Two-phase approach with specialized models
// Phase 1: Reasoning with DeepSeek R1
const reasoningResponse = await ollamaService.generateRawText(
  prompt,
  Model.R1,
  {
    temperature: 0.1,
    num_predict: -1,
  },
);

// Extract conclusion
const thinkCloseIndex = reasoningResponse.indexOf('</think>');
const conclusionText =
  thinkCloseIndex !== -1
    ? reasoningResponse.substring(thinkCloseIndex + '</think>'.length).trim()
    : reasoningResponse.trim();

// Phase 2: Structured parsing with Llama 3.2 (no extra prompt needed)
const jsonResponse = await ollamaService.generate(
  `Parse this conclusion: ${conclusionText}`,
  Model.LLAMA3_2_3B,
  {
    type: 'object',
    properties: {
      isOversold: { type: 'boolean' },
      isUncertain: { type: 'boolean' }, // Flag for uncertainty detection
    },
    required: ['isOversold', 'isUncertain'],
  },
  {
    temperature: 0.1,
  },
);

4. Handling Uncertainty in Market Conditions

A critical insight for real-world applications is dealing with uncertainty in market conditions:

Embrace Uncertainty: In real-world scenarios, it's often objectively difficult to definitively determine if a market is overbought or oversold, regardless of prompt sophistication.
Two-Step Uncertainty Handling:
1. Allow DeepSeek R1 to express uncertainty in its reasoning (let it think naturally)
2. In the JSON parsing step, include an explicit isUncertain property that the parsing model can set to true when it detects uncertainty in the conclusion
Separation of Concerns:
- Don't force the reasoning model to make definitive judgments in uncertain cases
- Don't add uncertainty handling to the reasoning step as it decreases quality and wastes tokens
- Use the parsing model to detect and flag uncertainty without forcing it to make the final decision

// JSON schema with uncertainty handling
const schema = {
  type: 'object',
  properties: {
    marketCondition: {
      type: 'string',
      enum: ['oversold', 'neutral', 'overbought'],
    },
    isUncertain: {
      type: 'boolean',
      description: 'Set to true if the conclusion expresses uncertainty',
    },
    confidenceScore: { type: 'number', minimum: 0, maximum: 1 },
  },
  required: ['marketCondition', 'isUncertain'],
};

5. Conclusion Extraction

For evaluation or further processing, extracting just the conclusion after the thinking process improves reliability:

// Extract only the conclusion part (after </think> tag)
const thinkCloseIndex = reasoning.indexOf('</think>');
const conclusionText =
  thinkCloseIndex !== -1
    ? reasoning.substring(thinkCloseIndex + '</think>'.length).trim()
    : reasoning.trim();

Performance Insights

DeepSeek R1 1.5B demonstrated impressive numerical reasoning capabilities:

Basic Comparisons: Reliably determines relationships between values (greater than, less than, equal to)
Decimal Handling: Correctly processes and compares decimal values
Threshold Evaluation: Accurately evaluates values against thresholds
Multi-Condition Analysis: Successfully navigates multiple conditions and ranges

The model's performance was consistent and reliable when properly configured, with tests completing in reasonable time (typically 4-6 seconds per reasoning task).

Implications for Market Comparison Service

These findings have significant implications for our market comparison service:

Reliable Numerical Analysis: DeepSeek R1 can be trusted for numerical reasoning tasks in market analysis, providing we configure it correctly with explicit parameters.
Domain-Specific Instructions: When we need the model to use non-standard thresholds or criteria, explicit instructions in the system prompt are essential.
Two-Phase Processing with Specialized Models: Use DeepSeek R1 for reasoning and lighter models like Llama 3.2 for structured output parsing.
Explicit Token Management: Always explicitly set num_predict: -1 for reasoning tasks to ensure complete thinking processes.
Response Validation: Always check for the presence of closing </think> tags before processing responses.
Uncertainty Handling: Include uncertainty detection in the JSON schema to handle ambiguous market conditions without forcing definitive judgments.

Conclusion

DeepSeek R1 1.5B is well-suited for numerical reasoning tasks in financial market analysis. Its ability to handle complex comparisons, integrate domain knowledge, and provide transparent reasoning makes it an excellent choice for our market comparison service.

By implementing the patterns and configurations discovered in our experiments, we can build a more reliable and effective market comparison system that leverages DeepSeek R1's strengths while mitigating potential issues.

These insights will directly inform the development of our MarketComparisonService, ensuring it provides accurate and reliable comparisons during tournaments.

Key Discoveries​

1. Thinking Process and Token Management​

2. Domain Knowledge Integration​

3. Structured Output Handling​

4. Handling Uncertainty in Market Conditions​

5. Conclusion Extraction​

Performance Insights​

Implications for Market Comparison Service​

Conclusion​