Mastering Neural Theorem Proving: A Step-by-Step Guide to DeepSeek-Prover-V2's Recursive Proof Search
By
<h2 id="overview">Overview</h2>
<p>DeepSeek-Prover-V2 represents a significant leap forward in automated mathematical reasoning. Built on the Lean 4 proof assistant, this open-source large language model (LLM) introduces a recursive theorem-proving pipeline that combines informal reasoning with rigorous formal verification. At its core lies a cold-start training method that synthesizes training data from scratch, followed by reinforcement learning to refine the model's ability to bridge the gap between human-like mathematical intuition and machine-checkable proofs. The model achieves state-of-the-art results on benchmarks like MiniF2F (88.9% pass) and PutnamBench (49/658 solved). This guide walks you through the key innovations, prerequisites for understanding the approach, and a step-by-step breakdown of the training pipeline.</p><figure style="margin:20px 0"><img src="https://i0.wp.com/syncedreview.com/wp-content/uploads/2025/04/%E5%B1%8F%E5%B9%95%E6%88%AA%E5%9B%BE-2025-04-30-233942.png?resize=593%2C311&amp;ssl=1" alt="Mastering Neural Theorem Proving: A Step-by-Step Guide to DeepSeek-Prover-V2's Recursive Proof Search" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px">Source: syncedreview.com</figcaption></figure>
<h2 id="prerequisites">Prerequisites</h2>
<p>Before diving into the details of DeepSeek-Prover-V2, ensure you have a basic understanding of:</p>
<ul>
<li><strong>Formal theorem proving</strong> and the <strong>Lean 4</strong> environment (syntax, tactics, and proofs)</li>
<li><strong>Large language models</strong> (LLMs) and their training paradigms (supervised learning, reinforcement learning)</li>
<li><strong>Chain-of-thought (CoT) reasoning</strong> in prompting LLMs</li>
<li>Basic <strong>machine learning concepts</strong> such as datasets, fine-tuning, and reward signals</li>
</ul>
<p>Familiarity with the original DeepSeek-Prover (V1) is helpful but not required. The guide focuses on V2's unique recursive proof search.</p>
<h2 id="step-by-step">Step-by-Step Instructions: The Training Pipeline of DeepSeek-Prover-V2</h2>
<h3 id="cold-start">1. Cold-Start Data Generation via Recursive Decomposition</h3>
<p>The process begins without any existing formal proof data for complex theorems. Instead, it uses a powerful base model (<strong>DeepSeek-V3</strong>) to generate high-quality synthetic data.</p>
<ol>
<li><strong>Prompt DeepSeek-V3</strong> with a complex mathematical theorem (e.g., a lemma from number theory). Instruct it to decompose the theorem into a sequence of simpler subgoals and formalize each step in Lean 4 syntax.</li>
<li><strong>Generate subgoals</strong>: DeepSeek-V3 outputs a list of intermediate lemmas that, if proven, entail the original theorem.</li>
<li><strong>Search each subgoal</strong>: A smaller 7B-parameter prover model attempts to prove each subgoal independently using standard tactics. This search is computationally light because subgoals are simpler.</li>
<li><strong>Assemble the proof</strong>: When all subgoals are proven, combine them with the original decomposition to form a complete formal proof. The informal chain-of-thought reasoning (CoT) from DeepSeek-V3 is paired with the formal steps.</li>
</ol>
<p><em>Example</em> (conceptual): For theorem "A implies B", DeepSeek-V3 might break it into "A implies C" and "C implies B", then formalize each. The 7B model solves those sub-goals, and the final training example includes the CoT + Lean code.</p>
<h3 id="reinforcement-learning">2. Reinforcement Learning from Subgoal-Proven Data</h3>
<p>After the cold-start phase, the team curates a subset of challenging problems that the 7B prover could not solve end-to-end but for which all subgoals were proven successfully.</p>
<ol>
<li><strong>Construct complete proofs</strong>: By concatenating the formal proofs of each subgoal, a full proof for the original problem is obtained.</li>
<li><strong>Create unified training examples</strong>: Each example pairs the informal CoT (outlining the decomposition) with the formal proof steps.</li>
<li><strong>Fine-tune the main prover model</strong> (DeepSeek-Prover-V2) on this synthetic dataset using standard supervised learning.</li>
<li><strong>Apply reinforcement learning</strong>: Use a binary reward signal (proof correct or incorrect) to further optimize the model. The reward is derived from Lean 4's verification result.</li>
</ol>
<p>This phase teaches the model to generate both the high-level plan and the low-level tactics in a unified manner.</p><figure style="margin:20px 0"><img src="https://i0.wp.com/miro.medium.com/v2/resize%3Afit%3A700/1%2AA4FJp063Twh0PPL5qxWlCQ.png?w=950&#038;ssl=1" alt="Mastering Neural Theorem Proving: A Step-by-Step Guide to DeepSeek-Prover-V2's Recursive Proof Search" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px">Source: syncedreview.com</figcaption></figure>
<h3 id="final-model">3. The Resulting Model and Benchmarking</h3>
<p>The final <strong>DeepSeek-Prover-V2-671B</strong> (671 billion parameters) is evaluated on:</p>
<ul>
<li><strong>MiniF2F-test</strong>: Achieves 88.9% pass rate, surpassing previous neural provers.</li>
<li><strong>PutnamBench</strong>: Solves 49 out of 658 problems from the prestigious Putnam competition.</li>
<li><strong>ProverBench</strong>: A new benchmark introduced alongside the model for evaluating mathematical reasoning capabilities.</li>
</ul>
<p>The model's proofs on MiniF2F are publicly available, allowing the community to verify and build upon them.</p>
<h2 id="common-mistakes">Common Mistakes and How to Avoid Them</h2>
<ul>
<li><strong>Insufficient decomposition</strong>: Failing to break theorems into sufficiently small subgoals leads to proof search failures. Ensure subgoals are at least one tactic-call away from being trivial for the 7B model.</li>
<li><strong>Ignoring informal reasoning</strong>: The chain-of-thought from DeepSeek-V3 is crucial. Omitting it reduces the model's ability to generalize. Always include the informal plan in training examples.</li>
<li><strong>Overfitting to synthetic data</strong>: The cold-start data may contain biases. Use reinforcement learning with binary rewards to correct mistakes and encourage novel proof strategies.</li>
<li><strong>Neglecting benchmarking</strong>: When applying the approach to new domains, create a diverse evaluation set like ProverBench to catch overfitting.</li>
<li><strong>Underestimating compute</strong>: The 671B model requires substantial hardware. Consider using the 7B prover for initial experiments before scaling up.</li>
</ul>
<h2 id="summary">Summary</h2>
<p>DeepSeek-Prover-V2 introduces a recursive proof search framework that leverages a powerful LLM to decompose theorems, a smaller model to solve subgoals, and reinforcement learning to unify informal and formal reasoning. By understanding the cold-start data generation and RL fine-tuning steps, researchers can replicate or adapt this approach to advance automated theorem proving. Key takeaways: use DeepSeek-V3 for decomposition, the 7B prover for subgoal search, and binary reward signals for refinement. The model's state-of-the-art results on MiniF2F and PutnamBench demonstrate its effectiveness.</p>