Graph search algorithms solve the following general problem:
Suppose you have a graph, and you begin at some node on the graph. This graph, for example, could represent a map of New York City — the edges could be streets, nodes could be intersections of streets, and the starting node could be your current location. The problem is to find a path with some property that ends at some goal node. On the graph of New York City, you might want to find the shortest route to a bar. On a graph of a game of chess (where nodes are game states and edges are moves), you might want to find any way for you to win. For our purposes, we will assume that the graph is connected, meaning there exists a path between every pair of nodes.
For now, we will consider two graph search algorithms: depthfirst and breadthfirst search. Depthfirst search (DFS) simply finds any path from the start node to a goal node, and breadthfirst search finds the shortest path. DFS search simply starts at the starting node and takes an edge over and over again until it reaches a goal node. Sometimes, if it reaches a situation where it can not reach any other node from where it is and has still not reached a goal node, it will backtrack up one level and try again with another node. If the graph is infinitely large, then DFS may go on forever.
Breadthfirst search (BFS) takes it a different strategy: from the starting node, it iteratively expands a “search bubble” — all nodes within this bubble will be at most a certain distance away. It then looks for a goal node within this bubble. If it does not find one, then it expands the bubble. Even if the graph is infinitely large, it will find a goal node if it exists, and it will find the shortest path to it.
We should first think about how we can represent a graph. The important thing about a graph is that each node consists of some unique “label” and a list of “connections” to other nodes. Furthermore, if we have information about just a single node in the graph, we can recover the entire graph by just following the connections of each node to the the next node. If we do this process enough time (until we start visiting the same node twice), we can determine what the entire graph looks like. Therefore, we only need to have the representation of a single node (the node’s label and its connections), and the entire graph can be reconstructed automatically. We will call the node we choose to represent the entire graph with the “focus” of the graph. In Haskell, we can represent a graph the following way:
data Graph a = Node { label :: a, connections :: [Graph a] }
Therefore, we can represent a simple cyclic graph with 5 vertices the following way:
cycle5 :: Graph Int
cycle5 = Node 1 [Node 2 [Node 3 [Node 4 [Node 5 [cycle5]]]]]
Note how, just like lists and trees, graphs in Haskell are defined recursively. This means that if we were to traverse this graph using some search procedure, we would keep cycling from 1 to 5.
It is often useful to prune the graph into a search tree so as to remove any cycles. In other words, we can transform cycle5
into the following:
pruned_cycle5 :: Graph Int
pruned_cycle5 = Node 1 [Node 2 [Node 3 [Node 4 [Node 5 [Node 1 []]]]]]
And it will be understood that the node labeled 1 later in the graph is the same as the node labeled 1 that is the focus of the graph. Pruning turns the infinitely large, recursively defined representation of the graph into an only finitely large representation that can be used to reconstruct that very same graph. We can define the pruning function in the following way:
prune :: Eq a => Graph a > Graph a
prune g = prune' [] g
where
prune' ps (Node x xs) = if x `elem` ps
then Node x []
else Node x (map (prune' (x:ps)) xs)
The prune
function will traverse the graph by following connections and maintaining a list of node labels it has seen so far — if it encounters a node it has seen before, it won’t expand its connections. Pruning is useful in both DFS and in BFS. Without pruning, DFS may not terminate (if it encounters a cycle in the graph). Although BFS will always terminate, pruning speeds it up by eliminating cycles in the graph. This does not change the result of the BFS because the shortest path can never contain cycles — if it did, then we could always cut the cycle out of the “shortest” path, and we would get a shorter path. We will always prune the graph before feeding it into a search algorithm.
The easier search algorithm to implement is DFS because we only need to find a single path to a goal node, not necessarily the shortest one. In Python, DFS would be implemented with a stack like so (assuming the graph has already been pruned):
def dfs(isGoal, startNode):
stack = [[startNode]]
while len(stack) != 0:
path = stack.pop()
if isGoal(path[1]):
return path
else:
for child in path[1].connections():
stack.append(path + [child])
return None
Because we are using a stack, a lastinfirstout (LIFO) data structure, we always pop out the last path we extended. If we end up at a deadend, we always end up popping out the last path we evaluated, which enables backtracking. In Haskell, we don’t need a separate stack to store all the paths — we can use recursion, which is essentially using the call stack as our LIFO structure. Here’s how we can do it:
dfs :: (a > Bool) > Graph a > Maybe [a]
dfs p (Node x xs) = if p x
then Just [x]
else fmap (x:) . msum . map (dfs p) $ xs
First, let’s inspect the type signature and convince ourselves it’s the same as the Python program. The first argument is the “isGoal” parameter, which is a function that takes in a node label as an argument and returns a Boolean indicating whether or not it qualifies as a goal node. The second argument is a graph, with the starting node as its focus. The dfs
function can either output a list of node labels representing a path from the starting node to a goal node wrapped in a Just
constructor, or it can output a Nothing
, indicating that there is no path from the starting node to a goal node. p
is the goal predicate, x
is the label of the node in focus, and xs
is the list of connections for that node (recall that it is of type [Graph a]
).
The first part of the ifbranch is relatively simple: if the currently focused node, x
, satisfies the goal predicate, then the path to a goal is Just [x]
— that is, we’re already at a goal node. To understand the second part, let’s first try to understand its components:

map (dfs p)
takes a list of graphs and runs the same DFS procedure on all of them, with the same goal predicate. It will output a list ofMaybe [a]
s. msum
takes a list ofMaybe t
values and returns the first one that is notNothing
. The “m” inmsum :: Monad m => [m a] > m a
refers to “Monad”, in this case theMaybe
monad. We can think ofNothing
as a sort of “zero” element, and we are summing up all theMaybe t
values in a list. Because zero is the identity for addition, it will be ignored inmsum
and we will, by convention, return the firstJust
value.fmap (x:)
appendsx
to the beginning of the list inside aJust
.Functor f => fmap :: (a > b) > (f a > f b)
is a “lifting” operation for functors, which are generally containers for elements. In this case, our functor isMaybe
. As an example:(+1) :: Int > Int fmap (+1) :: Maybe Int > Maybe Int fmap (+1) (Just 3) = Just 4 fmap (+1) Nothing = Nothing fmap (1:) (Just [2, 3, 4]) = Just [1, 2, 3, 4] fmap (1:) Nothing = Nothing
Note that
fmap
will returnNothing
if you try to operate onNothing
.
Putting this all together, if x
does not satisfy the goal predicate, then it will run the dfs p
function on all the connections of the focused node, it will find the first one that succeeds in finding a goal node, and it will append x
to the beginning of that path because x
had to be visited before traversing that path. If DFS fails to find a goal node on all of the connections, then because fmap f Nothing = Nothing
for all f
, Nothing
will be returned. Furthermore, msum [] = Nothing
(intuitively, the sum of zero number of things is zero), and so if a node does not satisfy the goal predicate and it has no connections, then Nothing
will be returned, as we expect. In all cases, therefore, this dfs
function is correct.
However, it looks like this implementation of DFS is run on every single connection of a single node, whereas DFS should only run along a single path and backtrack only when necessary. However, thanks to laziness, that is not the case. msum
will only have to evaluate the contents of the list map (dfs p) xs
until it finds the first successful path, and DFS will not be run on the remainder of the list. Backtracking is not as explicit in this implementation, but it occurs when msum
encounters a Nothing
in the list and ignores it and moves on to the next element.
We can confirm that the entire graph is not being searched by using one of the tools in the Haskell toolbox, the :sp
directive in GHCi. It will show you how a given variable is currently represented in memory: which portions are thunks and which portions have been evaluated. For example, if we define:
let x = 1 : 2 : 3 : 4 : []
Then:
GHCi> :sp x
====> x = _
Which indicates that because of lazy evaluation, x
is entirely a thunk. Now, if we evaluate the head of x
:
GHCi> head x
====> 1
GHCi> :sp x
====> x = 1 : _
Which indicates that only the portion of x
that has been evaluated is the head, and the tail is still a thunk. We can use :sp
to determine which portions of a data structure a function evaluates — in this case, which nodes in a graph our DFS procedure touches. Let’s try it out:
GHCi> let g = Node 1 [Node 2 [Node 3 []], Node 4 [Node 5 []], Node 5 []]
GHCi> let pg = prune g
GHCi> :sp pg
====> pg = _
GHCi> dfs (== 5) pg
====> Just [1, 4, 5]
GHCi> :sp pg
====> pg = Node 1 (Node 2 [Node 3 []] : Node 4 (Node 5 _ : _) : _)
Clearly, it does not have to process the entire search tree — once it finds a path to 5, it stops and returns that path.
BFS is a bit more difficult to tackle. But first, let’s discuss how we would implement BFS in Python:
def bfs(isGoal, startNode):
queue = [[startNode]]
while len(queue) != 0:
path = queue.pop(0)
if isGoal(path[1]):
return path
else:
for child in path[1].connections():
queue.append(path + [child])
return None
The only difference is that instead of a stack, we have a firstinfirstout (FIFO) queue! The basic idea is to preserve the invariant that in each iteration, we always pop the shortest path out of the queue. This is because every time we extend a path, we add it to the end of the queue, and we always pop from the front, selecting the paths that haven’t been extended yet.
In Haskell, however, the implementation for DFS and BFS aren’t quite so similar. What complicates things is that while stacks can be implemented easily using ordinary recursion, queues don’t have a simple analogue. We would like to avoid using an extra data structure, if possible, and stick to recursion.
Recall that BFS accomplishes two things: it finds the shortest path to a goal node, and it does not touch any part of the graph outside the search bubble. Therefore, if we find that the shortest path to a goal node is 3 steps long, then all nodes that are more than 3 steps away from the starting node should remain untouched after the algorithm terminates.
First, let’s suppose we had a function, rankedPaths
, which would take in a graph and output a list of all paths from the focus of the graph to every other node in order of increasing length. For now, we can simply leave this as undefined
:
rankedPaths :: Graph a > [[a]]
rankedPaths g = undefined
Leaving it as undefined
for now allows us to specify a type for rankedPaths
without having to implement it, so that we can specify how it will interlock with other functions but put off actually writing the function for later.
We also have a function in the Data.List
module called find :: (a > Bool) > [a] > Maybe a
which will find the first element in a list that satisfies a given predicate. If it finds it, it will return that element wrapped in a Just
constructor; if it does not, it will return Nothing
. Finally, we also have a function called last :: [a] > a
that will return the last element of a list. How can we put these together to implement BFS?
What we want to do is implement a function bfs :: (a > Bool) > Graph a > Maybe a
that will first generate a list of all paths in order of increasing length, and then select the first path whose last node satisfies our goal test. The following code does just that:
bfs :: (a > Bool) > Graph a > Maybe a
bfs p = find (p . last) . rankedPaths
However, we need to make sure that rankedPaths
is evaluated lazily — find
will stop as soon as it finds an element that satisfies the given predicate, but we need to make sure that evaluating the first element of the list does not force evaluation of the entire list. We want only the nodes inside the search bubble to be touched.
To implement rankedPaths
, we have to ensure that evaluating a given element of the list will not force the evaluation of any elements that come afterwards — we will do so by defining the tail of the returned list recursively and the head normally. First, note that if our graph is Node x xs
(recall x
will patternmatch as the label of the node currently in focus, and xs
will patternmatch as the list of connections of this node), then the shortest path in this graph is [x]
, which is just a path starting and ending at x
. The next shortest paths would be the paths from x
to its connections, and then the paths from x
to the connections of its connections, and so on.
So now we’ve learned a bit more about what rankedPaths
should look like:
rankedPaths :: Graph a > [[a]]
rankedPaths (Node x xs) = [x] : undefined
Next comes the induction step: suppose we knew what map rankedPaths xs
, or rankedPaths
for every connection of the currently focused node, was. Can we combine this data to get rankedPaths
for the currently focused node? If we have ranked lists of paths for all the connections of a node, can we combine them into a single ranked list of paths for the node?
We can, if we imagine we had a function called mergeRanked :: [[a]] > [[a]] > [[a]]
. This function will take two ranked lists of paths, and merge them into a single ranked list of paths. Then, we can use foldr mergeRanked
to merge a list of ranked lists of paths into a single ranked list of paths, and append an x
to the beginning of each path using map (x:)
. Finally, we also have to consider the case of the single path beginning and ending at x
, so we append the path [x]
to the beginning of the entire resulting list. The following is how we can implement rankedPaths
:
rankedPaths :: Graph a > [[a]]
rankedPaths (Node x xs) = ([x]:) . map (x:) . foldr mergeRanked [] . map rankedPaths $ xs
Of course, we still have to implement mergeRanked
. The basic idea is similar to how we implement the merge process in merge sort — we repeatedly take the smaller of the two heads from the two lists until one of the lists is empty:
mergeRanked :: [[a]] > [[a]] > [[a]]
mergeRanked [] ys = ys
mergeRanked xs [] = xs
mergeRanked (x:xs) (y:ys) = if length x < length y
then x : (mergeRanked xs (y:ys))
else y : (mergeRanked (x:xs) ys)
Altogether, we have:
bfs :: (a > Bool) > Graph a > Maybe [a]
bfs p = find (p . last) . rankedPaths
where
rankedPaths :: Graph a > [[a]]
rankedPaths (Node x xs) = ([x]:) . map (x:) . foldr mergeRanked [] . map rankedPaths $ xs
mergeRanked :: [[a]] > [[a]] > [[a]]
mergeRanked [] ys = ys
mergeRanked xs [] = xs
mergeRanked (x:xs) (y:ys) = if length x < length y
then x : (mergeRanked xs (y:ys))
else y : (mergeRanked (x:xs) ys)
And that’s the implementation of BFS! Sidebyside with the python implementation, it isn’t quite as terse, but it is idiomatically using recursion and laziness to our advantage. We can ensure that BFS is actually lazy and doesn’t touch anything outside the search bubble by using the :sp
tool again:
GHCi> let g = Node 1 [Node 2 [Node 3 []], Node 4 [Node 5 []], Node 5 []]
GHCi> let pg = prune g
GHCi> :sp pg
====> pg = _
GHCi> bfs (== 5) pg
====> Just [1, 5]
GHCi> :sp pg
====> pg = Node 1 [Node 2 [Node 3 _], Node 4 _, Node 5 _]
And indeed, nothing outside of 1 step beyond our search bubble has been touched.
Please leave comments below and let me know if you have any corrections or need any clarifications.
“Furthermore, if we have information about just a single node in the graph, we can recover the entire graph by just following the connections of each node to the the next node.”
So long as it is connected (which the usual definition of a graph does not necessitate).
LikeLike
Hi Harold,
Thanks for your input — the post has been updated with the assumption that the graph is connected.
LikeLike
Thanks for the post. It would be interesting to also see some performance figures. I remember being quite frustrated about poor performance of my Dijkstra code in Haskell (finding shortest paths is more difficult, of course, and perhaps worth another post?).
LikeLike