The proof of correctness for Prim’s minimum spanning tree (MST) algorithm holds a special place in my heart, because it was what turned me into a computer scientist. When I first saw this, I was studying Electrical Engineering, Computer Science, and Mathematics, and while programming was fun and creative it seems pretty straight-forward and even a little boring. You wrote code that obviously worked, right? The first time I saw greedy algorithms for the MST problem (both Prim’s algorithm and Kruskal’s algorithm), it was not clear at all that they worked – that they always found the minimum spanning tree. Seeing the proof of correctness was a relevation, and seemed a little like magic. Apply these simple rules and you’ll always compute the optimal solution. Always. And the proof establishes this. I continued to study electrical engineering and mathematics, but this topic is what led to my lifelong focus on computer science.
I first wrote the proof below when I taught data structures in Fall 2016 at UNC Greensboro. The prerequisite for data structures was discrete mathematics (called “Foundations of Computer Science” in the UNCG program), which used the textbook by James L. Hein. This is based on the presentation of Prim’s algorithm in that book, re-written slightly below.
Intuitive description of algorithm
\begin{algorithm} \caption{Prim's Algorithm for computing minimum spanning trees} \begin{algorithmic} \Procedure{Prims}{} \State $S\leftarrow \emptyset$ \State Pick any $v\in V$ \State $W\leftarrow \{v\}$ \While {$W\neq V$} \State $\{x,y\}\leftarrow$ min weight edge with $x\in W$ and $y\in V-W$ \State $S\leftarrow S\cup\{\{x,y\}\}$ \State $W\leftarrow W\cup \{y\}$ \EndWhile \Return $S$ \EndProcedure \end{algorithmic} \end{algorithm}
First note that S is clearly a spanning tree….
Theorem 1
If
S is the spanning tree selected by Prim’s algorithm for input graph
G=(V,E), then
S is a minimum-weight spanning tree for
G.
The proof is by contradiction, so assume that S is not minimum weight. Let ES=(e_1,e_2,\cdots,e_{n-1}) be the sequence of edges chosen (in this order) by Prim’s algorithm, and let U be a minimum-weight spanning tree that contains edges from the longest possible prefix of sequence ES.
Let e_i=\{x,y\} be the first edge added to S by Prim’s algorithm that is not in U, and let W be the set of vertices immediately before \{x,y\} is selected. Notice that it follows that U contains edges e_1,e_2,\cdots,e_{i-1} but not edge e_i.
There must be a path x\leadsto y in U, so let \{a,b\} be the first edge on this path with one endpoint (a) inside W, and the other endpoint (b) outside W, as in the following picture:
Define the set of edges T=U+\{\{x,y\}\}-\{\{a,b\}\}, and notice that T is a spanning tree for graph G. Consider the three possible cases for the weights of edges \{x,y\} and \{a,b\}:
Case 1, w(\{a,b\})>w(\{x,y\}): In this case, in creating T we have added an edge that has smaller weight then the one we removed, and so w(T)<w(U). However, this is impossible, since U is a minimum-weight spanning tree.
Case 2, w(\{a,b\})=w(\{x,y\}): In this case w(T)=w(U), so T is also a minimum spanning tree. Furthermore, since Prim’s algorithm hasn’t selected edge \{a,b\} yet, that edge cannot be one of e_1,e_2,\cdots,e_{i-1}. This implies that T contains edges e_1,e_2,\cdots,e_i, which is a longer prefix of ES than U contains. This contradicts the definition of tree U.
Case 3, w(\{a,b\})<w(\{x,y\}): In this case, since the weight of edge \{a,b\} is smaller, Prim’s algorithm will select \{a,b\} at this step. This contradicts the definition of edge \{x,y\}.
Since all possible cases lead to contradictions, our original assumption (that S is not minimum-weight) must be invalid. This proves the theorem.