Monoids, Functors, Applicatives, and Monads: 10 Main Ideas

I’m making this post because after spending months learning about all the different properties of monoids, functors, applicatives, and monads, I thought it would be nice to collect everything I’ve learned from different tutorials, StackOverflow questions, and reddit posts into a neat beginner’s introduction to these topics. I’ll break up this post into a few Main Ideas and say a little about each.

Main Idea #1: Monoids, functors, applicatives, and monads are all different algebras

Types like IO a are not monads themselves, they have some properties about how they can be combined together that means they follow the monad algebra. Just like we can use the natural numbers to add or multiply, we can use IO in a way that is monadic or applicative. In some cases, IO values can also be combined using the monoid algebra, and they can always be transformed using properties of functors. Lists are a common example of functors as well, but you can also combine them in other ways that follow a variety of other algebras. But long story short: there’s no such thing as the “IO monad” or the “List monad”, just the ability to use certain properties of IO and List to combine values in useful ways.

Main Idea #2: Monoids, functors, applicatives, and monads are ideas about computation, and are not just specific to Haskell

Because lists, tuples, and a variety of similar types can be found in almost all high-level programming languages, you will see examples of where values are combined or transformed in a way that makes you go, “hey, applicative!”. All of these algebras are ideas from category theory that are also useful in programming. However, Haskell is particularly expressive in its ability to specify which types follow which sorts of algebras, and so you hear Haskellers talk about them a lot.

Main Idea #3: Functors transform values inside containers, while applicatives and monads combine values inside containers

In Haskell, functions of type a -> b transform values of type a into values of type b. For example, the (+1) :: Int -> Int function increments a number by 1, and the show :: Int -> String turns a number into its string representation. But what if your value was enclosed in some sort of a container? For example, if you know how to turn an integer into a string, do you know how to turn a list of integers into a list of strings? Yes, that’s the map function!

map (+1) [1, 2, 3, 4] = [2, 3, 4, 5]
map show [1, 2, 3, 4] = ["1", "2", "3", "4"]

That is the idea behind the functor algebra. If something is a functor, then there is a function called fmap that will lift transformations from a -> b to f a -> f b. fmap specialized to lists, for example, has the type (a -> b) -> ([a] -> [b]), while fmap specialized to the Maybe has the type (a -> b) -> (Maybe a -> Maybe b)

We can also have a container with function(s) of type a -> b in it and another container with value(s) of type a in it, and try to combine the two to get a container with value(s) of type b in it. An example specialized to lists is the following:

[(*2), (+1)] <*> [1, 2, 3] = [2, 4, 6, 2, 3, 4]

Here, the <*> function just combines each pair of functions from the first list and values from the second list in a way reminiscent of list comprehensions. For a container of type Maybe, you might use the <*> function like this:

Just (+1) <*> Just 3 = Just 4
Nothing <*> Just 3 = Nothing
Just (+1) <*> Nothing = Nothing

The idea we’re expressing here is that if one computation fails (if one of the values is a Nothing), then the result also fails. That is why the Maybe a type is used when we want to deal with computations that can fail (e.g., error handling).

Finally, we also have a function for creating a container with a single value inside it, called pure. pure can be used in the following ways:

pure "hi" = ["hi"] -- for lists
pure "hi" = Just "hi" -- for Maybe types
pure "hi" = return "hi" -- for IO values
-- ... and so on for every other container type

The functions <*> and pure together constitute the applicative algebra.

Main Idea #4: Things that are monads can be combined using a special type of function composition

We talked about how things that are applicatives can be combined together using the <*> function, and later we’ll learn about how monoids have their own rules about how to combine things together. Monads also have to do with combining containers together, and they do so using basically a special type of function composition.

Function composition has two main ideas:

  1. If you have two functions, f :: a -> b and g :: b -> c, then you can create a new function g . f :: a -> c simply by taking the input value, running f on it first, and then running g on the result of that. As you can see, function composition is related to the “sequence” in which you run computations, which will be a key idea with monads.
  2. There is an identity function, id :: a -> a, that simply outputs what you give it.

It also comes with two laws:

  1. Function composition is associative: f . (g . h) = (f . g) . h. Therefore, there is no use in typing out parentheses, so we often just write f . g . h when we mean one or the other.
  2. Composition with the identity function has no effect: id . f = f . id.

Let’s look at the types of (.) and id:

(.) :: (b -> c) -> (a -> b) -> (a -> c)
id :: a -> a

Now, suppose that we’re dealing with a different space (a different “category”) of functions: functions that take in ordinary values but return values inside containers. For example, if we’re dealing with computations that might fail, all our functions might be of the type a -> Maybe b for some a and b. Turns out we also have two analogous functions for dealing with this space of functions:

(<=<) :: (b -> f c) -> (a -> f b) -> (a -> f c)
return :: a -> f a

The return function is the same as the pure function the applicative world: it puts a value inside a container.

These two functions define the monad algebra.

We can use the monad algebra, for example, when we want to compose different IO actions together. The IO function readFile :: FilePath -> IO String reads text from the file at a given path, and the putStr :: String -> IO () function prints out some text to standard output. If we wanted to combine these functions together into some printFile :: FilePath -> IO () function that reads a file at the given path and prints out its contents, we could try to use ordinary function composition, but the types wouldn’t quite work out. The function composition operator knows how to feed the output value of one function as the input of another, but it doesn’t know how to deal with the side effects resulting from running each IO action. But monadic function composition does know how to compose the side effects of each action, and the types work out too. Therefore, we can define:

printFile = putStr <=< readFile

And it does what we expect.

Main Idea #5: You can put values inside a container but you can’t always take them out

For both the monad and the applicative algebra, you have a function that takes a value and wraps it in a specific container type. This is always a total function, and can be done without any loss of information. However, you can’t always take a value out of its container (f a -> a) without some loss of information or loss of generality. For Maybe Int types, for example, it is easy to extract 3 when you are given Just 3, but how can you extract a value out of Nothing? With [a], you lose some information when you have to extract a single value out of a list with multiple values in it, and you fail when you try to extract a value from an empty list. With IO a values, Haskell prohibits us from ever removing the “IO” tag because we care about distinguishing computations with side effects from pure computations. Therefore, when dealing with values inside a particular container type, we cannot extract a value from that container, but if the container follows the different algebras we discussed, we are guaranteed that there are four types of operations we can do that will not fail:

  1. We can transform the value inside the container using fmap (functors)
  2. We can combine a container of functions and a container of values together using <*> (applicatives)
  3. We can compose functions that produce containers of values together in the order we specify using <=< (monads)
  4. And we can take a single value and wrap it inside a container using pure (applicatives) or return (monads)

Main Idea #6: Applicatives and monads offer an elegant way to manage side effects and state

Although we have briefly touched upon the notion of failure, we’ve mostly talked about applicatives and monads as the common algebra of different container types. In a different viewpoint, we can say that applicatives and monads are the algebra of things that have side effects. It is hard to define what a side effect is, but generally, if a function produces a side effect, it means it produces something “extra” in addition to its output value. A typical example is a function of the form a -> Maybe b, which can sometimes produce an ordinary value (e.g., Just 's') and sometimes an error value (Nothing). Lists provide the side effect of having an arbitrary number of return values, and other container types like trees provide a similar side effect except the return values also have some structural relationships between them.

The side-effects viewpoint also allows us to make sense of the idea of “state”. Sometimes, a function will produce a value that depends on what the current state of a program is. If the state of a program can be represented by the type State, then the type State -> a represents something whose value depends on the current state of the program. If the State is one thing, then it will return one value, and if the State is something else, then it will return another value. Then, if a function has the type a -> (State -> b), that means it takes in some input value and returns an output value that depends on the state of the program.

We can also model things that both read the state of the program and modify it with the State -> (a, State) type. This will read the state of the program, and return the return value as well as a new state for the program. This sort of side effect is very useful, and when combined with the error-handling capabilities of Maybe, is essentially a model of what IO does: it returns a value dependent on the state of the world at the time it is executed, but it can also modify the state of the world and fail for different reasons.

Both of these state-dependent types follow the applicative and monad algebras because applicatives and monads both capture the idea that side effects can often be composed. With the type State -> (a, State), for example, if we want to compose two functions f and g with stateful side effects together in a monadic way, we can simply take the state that the f produces and feed it as the state into g, creating the illusion that f really does modify the state of the program before g is run. Applicatives and monads therefore abstract away handling side effects into the background. This allows us to easily build anything that uses state, from random number generators to parsers.

The <*> function from the applicative side of things also composes actions together, though not as obviously (we noted earlier that in the case of Maybe, if one computation fails, the whole computation fails). You can think of the difference between applicatives and monads in the following way:

  • Applicatives compose actions of the form f (a -> b) and f (b -> c) together to form an action of the form f (a -> c)
  • Monads compose actions of the form a -> f b and b -> f c together to form an action of the form a -> f c

Main Idea #7: Applicatives and monads both model running computations in sequence, but monads are more powerful

By composing side effects together, both applicatives and monads are able to run computations in the order that you specify. However, there is an important difference: the way applicatives compose these hybrid side-effect-containers is insensitive to the values inside the containers, while the way monads compose them together depends on those values. When <*> combines a list with 4 items and a list with 7 items, the result will always be a list with 28 items, no matter what items are inside the list. The (<=<) :: (b -> f c) -> (a -> f b) -> (a -> f c) function, however, allows the input to the second function (b) to depend on what is inside the output to the first function (f b), and so we can never be certain as to what the result of composing computations monadically will look like. Allowing this extra dependency makes monads inherently more powerful: while with applicatives you can sequence computations, with monads, you can sequence computations with the additional property that the result of one computation can depend on the result of another computation. In fact, monads have the full power of a Turing machine while applicatives only have the power of a pushdown automaton. However, with applicatives, you can be certain when a computation will fail (it will fail if any of the components fail), while with monads, you may have many, many dependencies and determining when a computation will fail can be literally impossible.

Main Idea #8: Applicatives and monads come with nice syntax

Suppose you have two lists and you want to take the product of every pair of values (where one value in the pair comes from one list and the other value comes from the other list). With applicatives, you can do it like so:

products = (*) <$> [1, 2, 3] <*> [2, 6, 10]

You can put together an arbitrary number of things together using some pure function:

result = (\w x y z -> x + y - z * w) <$> Just 3 <*> Just 10 <*> Nothing <*> Just 25

This works because the <$> function is actually a combination of pure and <*> characteristic of the applicative algebra:

(<$>) :: (a -> b) -> (f a -> f b)
f <$> x = pure f <*> x

And so what we are really saying is:

result = (((pure (\w x y z -> x + y - z * w) <*> Just 3) <*> Just 10) <*> Nothing) <*> Just 25

It is the wrapping of a pure function inside its own container followed by repeated partial application using the <*> operator.
With monads, you can use the special “do notation”:

products = do
    x <- [1, 2, 3]
    y <- [2, 6, 10]
    return (x * y)

The do notation also makes it obvious that the result of one computation can depend on the result of another:

products = do
    x <- [1, 2, 3]
    y <- [1 .. x]
    return (x * y)

Internally, it uses the (>>=) :: f a -> (a -> f b) -> f b operator, which can be defined using the (<=<) monadic composition operator:

(>>=) :: f a -> (a -> f b) -> f b
x >>= f = f <=< (\() -> x) $ ()

Intuitively, this operator, called “bind”, takes a container of some value(s) and passes it into a monadic function, and the result will be another container of some value(s). For example, if we had some Maybe a value and some a -> Maybe b function, we could combine the two using the bind operator to produce a Maybe b value. Notice how this again emphasizes that monads can sequence actions in which the next computation can depend on the result of the previous.

Using this function, we can desugar the above code example using the do notation into:

products = [1, 2, 3] >>= \x -> [1 .. x] >>= \y -> return (x * y)

Which is exactly what Haskell will do internally.

Main Idea #9: Monoids are a general algebra for combining things in a certain way

Monoids are the most basic way of combining values together. Unlike functors, applicatives, and monads, they don’t necessarily have to combine containers of values: they can combine any values with certain properties. Monoids technically are an algebra over a set of elements and a certain binary operation ((<>) :: a -> a -> a) such that:

  1. The binary operation is associative
  2. There exists an identity element (mempty) for which the following identity law holds: mempty <> x = x <> mempty = x

Lists are monoids: the operation to combine them is simply concatenation. So are many other types of containers, but interestingly, so are monads.

Main Idea #10: Monads are monoids in the category of endofunctors

There is one last way to think about monads: the ability to join together a container of containers of values into a container of values. To see why this, along with return, is sufficient for something to follow the monad algebra, suppose we had a function, join :: f (f a) -> f a. Then, we could implement the monadic composition operation like so:

(<=<) :: (b -> f c) -> (a -> f b) -> (a -> f c)
g <=< f = \x -> join (fmap g (f x))

Which is sufficient for something to be a monad. Often, the join function is easier to think about than the weird container-side-effect-special-monadic function composition operator. For example, to join a list of lists into a list, we can simply fold concatenation over the upper list. To join a tree of trees into a tree, we simply anneal the leaves of the upper tree with the roots of the lower trees. For types having to do with state and side effects, however, the function composition view is often more intuitive.

Now, let’s consider a specific container type, like List, and the join binary operation, which turns a List . List into a List. We can see that join is something that combines two functors into one, and we can see that return is some sort of identity such that join . return = id. It turns out that the join operation is also associative: if you have three or more layers of functors over values, then it doesn’t matter which order you merge the layers in to reduce it down to a single layer of a functor over a value. If we squint our eyes a little bit, we notice this reminds us of the monoid algebra a bit, where join is the operation that combines, in this case, two layers of a functor into a single layer of a functor. That is what is meant by probably the most terse description of monads: “Monads are monoids in the category of endofunctors”.

Where to next?

Hopefully this post gave a nice overview of the different properties of the four algebras. From here, it will be useful to learn about how to use the monads, applicatives, etc. over different types, like:

  • Maybe
  • IO
  • List
  • Either
  • State
  • Parser

These four algebras come up over and over again in Haskell, so while it is nice to understand what they are generally speaking, in practical usage you will need to know how to use the algebras for each different type.

Please leave comments below and let me know if you have any corrections or need any clarifications.

Advertisements

Purely Functional: Graph Search Algorithms

Graph search algorithms solve the following general problem:

Suppose you have a graph, and you begin at some node on the graph. This graph, for example, could represent a map of New York City — the edges could be streets, nodes could be intersections of streets, and the starting node could be your current location. The problem is to find a path with some property that ends at some goal node. On the graph of New York City, you might want to find the shortest route to a bar. On a graph of a game of chess (where nodes are game states and edges are moves), you might want to find any way for you to win. For our purposes, we will assume that the graph is connected, meaning there exists a path between every pair of nodes.

For now, we will consider two graph search algorithms: depth-first and breadth-first search. Depth-first search (DFS) simply finds any path from the start node to a goal node, and breadth-first search finds the shortest path. DFS search simply starts at the starting node and takes an edge over and over again until it reaches a goal node. Sometimes, if it reaches a situation where it can not reach any other node from where it is and has still not reached a goal node, it will backtrack up one level and try again with another node. If the graph is infinitely large, then DFS may go on forever.

Breadth-first search (BFS) takes it a different strategy: from the starting node, it iteratively expands a “search bubble” — all nodes within this bubble will be at most a certain distance away. It then looks for a goal node within this bubble. If it does not find one, then it expands the bubble. Even if the graph is infinitely large, it will find a goal node if it exists, and it will find the shortest path to it.

We should first think about how we can represent a graph. The important thing about a graph is that each node consists of some unique “label” and a list of “connections” to other nodes. Furthermore, if we have information about just a single node in the graph, we can recover the entire graph by just following the connections of each node to the the next node. If we do this process enough time (until we start visiting the same node twice), we can determine what the entire graph looks like. Therefore, we only need to have the representation of a single node (the node’s label and its connections), and the entire graph can be reconstructed automatically. We will call the node we choose to represent the entire graph with the “focus” of the graph. In Haskell, we can represent a graph the following way:

data Graph a = Node { label :: a, connections :: [Graph a] } 

Therefore, we can represent a simple cyclic graph with 5 vertices the following way:

cycle5 :: Graph Int
cycle5 = Node 1 [Node 2 [Node 3 [Node 4 [Node 5 [cycle5]]]]]

Note how, just like lists and trees, graphs in Haskell are defined recursively. This means that if we were to traverse this graph using some search procedure, we would keep cycling from 1 to 5.

It is often useful to prune the graph into a search tree so as to remove any cycles. In other words, we can transform cycle5 into the following:

pruned_cycle5 :: Graph Int
pruned_cycle5 = Node 1 [Node 2 [Node 3 [Node 4 [Node 5 [Node 1 []]]]]]

And it will be understood that the node labeled 1 later in the graph is the same as the node labeled 1 that is the focus of the graph. Pruning turns the infinitely large, recursively defined representation of the graph into an only finitely large representation that can be used to reconstruct that very same graph. We can define the pruning function in the following way:

prune :: Eq a => Graph a -> Graph a
prune g = prune' [] g
    where
        prune' ps (Node x xs) = if x `elem` ps
            then Node x []
            else Node x (map (prune' (x:ps)) xs) 

The prune function will traverse the graph by following connections and maintaining a list of node labels it has seen so far — if it encounters a node it has seen before, it won’t expand its connections. Pruning is useful in both DFS and in BFS. Without pruning, DFS may not terminate (if it encounters a cycle in the graph). Although BFS will always terminate, pruning speeds it up by eliminating cycles in the graph. This does not change the result of the BFS because the shortest path can never contain cycles — if it did, then we could always cut the cycle out of the “shortest” path, and we would get a shorter path. We will always prune the graph before feeding it into a search algorithm.

The easier search algorithm to implement is DFS because we only need to find a single path to a goal node, not necessarily the shortest one. In Python, DFS would be implemented with a stack like so (assuming the graph has already been pruned):

def dfs(isGoal, startNode):
    stack = [[startNode]]
    while len(stack) != 0:
        path = stack.pop()
        if isGoal(path[-1]):
            return path
        else:
            for child in path[-1].connections():
                stack.append(path + [child])
    return None 

Because we are using a stack, a last-in-first-out (LIFO) data structure, we always pop out the last path we extended. If we end up at a dead-end, we always end up popping out the last path we evaluated, which enables backtracking. In Haskell, we don’t need a separate stack to store all the paths — we can use recursion, which is essentially using the call stack as our LIFO structure. Here’s how we can do it:

dfs :: (a -> Bool) -> Graph a -> Maybe [a]
dfs p (Node x xs) = if p x
    then Just [x]
    else fmap (x:) . msum . map (dfs p) $ xs 

First, let’s inspect the type signature and convince ourselves it’s the same as the Python program. The first argument is the “isGoal” parameter, which is a function that takes in a node label as an argument and returns a Boolean indicating whether or not it qualifies as a goal node. The second argument is a graph, with the starting node as its focus. The dfs function can either output a list of node labels representing a path from the starting node to a goal node wrapped in a Just constructor, or it can output a Nothing, indicating that there is no path from the starting node to a goal node. p is the goal predicate, x is the label of the node in focus, and xs is the list of connections for that node (recall that it is of type [Graph a]).

The first part of the if-branch is relatively simple: if the currently focused node, x, satisfies the goal predicate, then the path to a goal is Just [x] — that is, we’re already at a goal node. To understand the second part, let’s first try to understand its components:

  1.  map (dfs p) takes a list of graphs and runs the same DFS procedure on all of them, with the same goal predicate. It will output a list of Maybe [a]s.
  2. msum takes a list of Maybe t values and returns the first one that is not Nothing. The “m” in msum :: Monad m => [m a] -> m a refers to “Monad”, in this case the Maybe monad. We can think of Nothing as a sort of “zero” element, and we are summing up all the Maybe t values in a list. Because zero is the identity for addition, it will be ignored in msum and we will, by convention, return the first Just value.
  3. fmap (x:) appends x to the beginning of the list inside a Just. Functor f => fmap :: (a -> b) -> (f a -> f b) is a “lifting” operation for functors, which are generally containers for elements. In this case, our functor is Maybe. As an example:
    (+1) :: Int -> Int
    fmap (+1) :: Maybe Int -> Maybe Int
    fmap (+1) (Just 3) = Just 4
    fmap (+1) Nothing = Nothing
    fmap (1:) (Just [2, 3, 4]) = Just [1, 2, 3, 4]
    fmap (1:) Nothing = Nothing 

    Note that fmap will return Nothing if you try to operate on Nothing.

Putting this all together, if x does not satisfy the goal predicate, then it will run the dfs p function on all the connections of the focused node, it will find the first one that succeeds in finding a goal node, and it will append x to the beginning of that path because x had to be visited before traversing that path. If DFS fails to find a goal node on all of the connections, then because fmap f Nothing = Nothing for all f, Nothing will be returned. Furthermore, msum [] = Nothing (intuitively, the sum of zero number of things is zero), and so if a node does not satisfy the goal predicate and it has no connections, then Nothing will be returned, as we expect. In all cases, therefore, this dfs function is correct.

However, it looks like this implementation of DFS is run on every single connection of a single node, whereas DFS should only run along a single path and backtrack only when necessary. However, thanks to laziness, that is not the case. msum will only have to evaluate the contents of the list map (dfs p) xs until it finds the first successful path, and DFS will not be run on the remainder of the list. Backtracking is not as explicit in this implementation, but it occurs when msum encounters a Nothing in the list and ignores it and moves on to the next element.

We can confirm that the entire graph is not being searched by using one of the tools in the Haskell toolbox, the :sp directive in GHCi. It will show you how a given variable is currently represented in memory: which portions are thunks and which portions have been evaluated. For example, if we define:

let x = 1 : 2 : 3 : 4 : [] 

Then:

GHCi> :sp x
====> x = _ 

Which indicates that because of lazy evaluation, x is entirely a thunk. Now, if we evaluate the head of x:

GHCi> head x
====> 1
GHCi> :sp x
====> x = 1 : _ 

Which indicates that only the portion of x that has been evaluated is the head, and the tail is still a thunk. We can use :sp to determine which portions of a data structure a function evaluates — in this case, which nodes in a graph our DFS procedure touches. Let’s try it out:

GHCi> let g = Node 1 [Node 2 [Node 3 []], Node 4 [Node 5 []], Node 5 []]
GHCi> let pg = prune g
GHCi> :sp pg
====> pg = _
GHCi> dfs (== 5) pg
====> Just [1, 4, 5]
GHCi> :sp pg
====> pg = Node 1 (Node 2 [Node 3 []] : Node 4 (Node 5 _ : _) : _) 

Clearly, it does not have to process the entire search tree — once it finds a path to 5, it stops and returns that path.

BFS is a bit more difficult to tackle. But first, let’s discuss how we would implement BFS in Python:

def bfs(isGoal, startNode):
    queue = [[startNode]]
    while len(queue) != 0:
        path = queue.pop(0)
        if isGoal(path[-1]):
            return path
        else:
            for child in path[-1].connections():
                queue.append(path + [child])
    return None 

The only difference is that instead of a stack, we have a first-in-first-out (FIFO) queue! The basic idea is to preserve the invariant that in each iteration, we always pop the shortest path out of the queue. This is because every time we extend a path, we add it to the end of the queue, and we always pop from the front, selecting the paths that haven’t been extended yet.

In Haskell, however, the implementation for DFS and BFS aren’t quite so similar. What complicates things is that while stacks can be implemented easily using ordinary recursion, queues don’t have a simple analogue. We would like to avoid using an extra data structure, if possible, and stick to recursion.

Recall that BFS accomplishes two things: it finds the shortest path to a goal node, and it does not touch any part of the graph outside the search bubble. Therefore, if we find that the shortest path to a goal node is 3 steps long, then all nodes that are more than 3 steps away from the starting node should remain untouched after the algorithm terminates.

First, let’s suppose we had a function, rankedPaths, which would take in a graph and output a list of all paths from the focus of the graph to every other node in order of increasing length. For now, we can simply leave this as undefined:

rankedPaths :: Graph a -> [[a]]
rankedPaths g = undefined 

Leaving it as undefined for now allows us to specify a type for rankedPaths without having to implement it, so that we can specify how it will interlock with other functions but put off actually writing the function for later.

We also have a function in the Data.List module called find :: (a -> Bool) -> [a] -> Maybe a which will find the first element in a list that satisfies a given predicate. If it finds it, it will return that element wrapped in a Just constructor; if it does not, it will return Nothing. Finally, we also have a function called last :: [a] -> a that will return the last element of a list. How can we put these together to implement BFS?

What we want to do is implement a function bfs :: (a -> Bool) -> Graph a -> Maybe a that will first generate a list of all paths in order of increasing length, and then select the first path whose last node satisfies our goal test. The following code does just that:

bfs :: (a -> Bool) -> Graph a -> Maybe a
bfs p = find (p . last) . rankedPaths 

However, we need to make sure that rankedPaths is evaluated lazily — find will stop as soon as it finds an element that satisfies the given predicate, but we need to make sure that evaluating the first element of the list does not force evaluation of the entire list. We want only the nodes inside the search bubble to be touched.

To implement rankedPaths, we have to ensure that evaluating a given element of the list will not force the evaluation of any elements that come afterwards — we will do so by defining the tail of the returned list recursively and the head normally. First, note that if our graph is Node x xs (recall x will pattern-match as the label of the node currently in focus, and xs will pattern-match as the list of connections of this node), then the shortest path in this graph is [x], which is just a path starting and ending at x. The next shortest paths would be the paths from x to its connections, and then the paths from x to the connections of its connections, and so on.

So now we’ve learned a bit more about what rankedPaths should look like:

rankedPaths :: Graph a -> [[a]]
rankedPaths (Node x xs) = [x] : undefined 

Next comes the induction step: suppose we knew what map rankedPaths xs, or rankedPaths for every connection of the currently focused node, was. Can we combine this data to get rankedPaths for the currently focused node? If we have ranked lists of paths for all the connections of a node, can we combine them into a single ranked list of paths for the node?

We can, if we imagine we had a function called mergeRanked :: [[a]] -> [[a]] -> [[a]]. This function will take two ranked lists of paths, and merge them into a single ranked list of paths. Then, we can use foldr mergeRanked to merge a list of ranked lists of paths into a single ranked list of paths, and append an x to the beginning of each path using map (x:) . Finally, we also have to consider the case of the single path beginning and ending at x, so we append the path [x] to the beginning of the entire resulting list. The following is how we can implement rankedPaths:

rankedPaths :: Graph a -> [[a]]
rankedPaths (Node x xs) = ([x]:) . map (x:) . foldr mergeRanked [] . map rankedPaths $ xs 

Of course, we still have to implement mergeRanked. The basic idea is similar to how we implement the merge process in merge sort — we repeatedly take the smaller of the two heads from the two lists until one of the lists is empty:

mergeRanked :: [[a]] -> [[a]] -> [[a]]
mergeRanked [] ys = ys
mergeRanked xs [] = xs
mergeRanked (x:xs) (y:ys) = if length x < length y
    then x : (mergeRanked xs (y:ys))
    else y : (mergeRanked (x:xs) ys) 

Altogether, we have:

bfs :: (a -> Bool) -> Graph a -> Maybe [a]
bfs p = find (p . last) . rankedPaths
    where
        rankedPaths :: Graph a -> [[a]]
        rankedPaths (Node x xs) = ([x]:) . map (x:) . foldr mergeRanked [] . map rankedPaths $ xs
        
        mergeRanked :: [[a]] -> [[a]] -> [[a]]
        mergeRanked [] ys = ys
        mergeRanked xs [] = xs
        mergeRanked (x:xs) (y:ys) = if length x < length y
            then x : (mergeRanked xs (y:ys))
            else y : (mergeRanked (x:xs) ys) 

And that’s the implementation of BFS! Side-by-side with the python implementation, it isn’t quite as terse, but it is idiomatically using recursion and laziness to our advantage. We can ensure that BFS is actually lazy and doesn’t touch anything outside the search bubble by using the :sp tool again:

GHCi> let g = Node 1 [Node 2 [Node 3 []], Node 4 [Node 5 []], Node 5 []]
GHCi> let pg = prune g
GHCi> :sp pg
====> pg = _
GHCi> bfs (== 5) pg
====> Just [1, 5]
GHCi> :sp pg
====> pg = Node 1 [Node 2 [Node 3 _], Node 4 _, Node 5 _] 

And indeed, nothing outside of 1 step beyond our search bubble has been touched.

Please leave comments below and let me know if you have any corrections or need any clarifications.