RC W2D2 - Lazy vs eager evaluation

In a previous post, I shared how my plan at RC was inspired by the courses I did with David Beazley. I wasn’t sure then if I was paraphrasing him accurately, but yesterday he wrote in his newsletter:

Meanwhile, I've managed to dig myself into a pretty deep rabbit hole as a result of the month-long courses I offered back in January. The extra time allowed me to make an attempt at writing the compiler project in Haskell, which to be honest, kind of blew my mind and exposed a totally different way of doing things in Python. I'll have more to say about that once I finish putting the pieces of my brain back together.

Learning a new language usually starts off a little painful but you feel a bit better once you start to string words together - this applies to both natural and programming languages. Haskell is not only in the functional paradigm but also has an expressive type system. It takes a bit getting used to, so far so good.

Chapter 4 of SICP involves writing a Scheme interpreter in Scheme, the so-called metacircular evaluator. This exercise also introduces the notion that you can make the language do what you want, for example make Scheme use lazy evaluation instead of eager (SICP uses the terms ‘applicative’ vs ’normal’ order, discussion here). I was intrigued. What else do you need to couple with lazy evaluation to make the language ‘work’?

From the CIS147 lecture notes.

Lazy evaluation makes it hard to reason about when things will be evaluated; hence including side effects in a lazy language would be extremely unintuitive. Historically, this is the reason Haskell is pure: initially, the designers of Haskell wanted to make a lazy functional language, and quickly realized it would be impossible unless it also disallowed side effects.

I assumed Haskell was intended to be pure from the start. TIL it’s a corollary of wanting the language be lazy.

An exercise in SICP involves representing map and filter operations as special cases of reduce. In fact SICP uses the term ‘accumulate’ for reduce, while Haskell uses ‘fold’ - we’ll use the latter term from here onwards. What I discovered in Haskell that I didn’t cover in SICP is the practical implications of left fold (`foldl`) vs right fold (`foldr`).

First, a more illustrative explanation. Fold lets you apply an operation successively over a list, here we sum up values in a list using left fold.
foldl (+) 0 [1, 2, 3]
= foldl (+) (0 + 1) [2,3]
= foldl (+) ((0 + 1) + 2) [3]
= foldl (+) (((0 + 1) + 2) + 3) []
= (((0 + 1) + 2) + 3)
= ((1 + 2) + 3)
= (3 + 3)
= 6

This is in contrast with right folds.

foldr (+) 0 [1, 2, 3]
= foldr (+) (3 + 0) [1, 2]
= foldr (+) (2 + (3 + 0)) [1]
= foldr (+) (1 + (2 + (3 + 0))) []
= (1 + (2 + (3 + 0)))
= (1 + (2 + 3))
= (1 + 5)
= 6

In an eager language, left folds are memory-efficient because we can use constant space to do each successive evaluation i.e. `0+1`, `1+2`, `3+3` etc; this is usually referred to as the tail call optimization. In right folds, we need to build up each successive operation in a stack frame.

In a lazy language, however, expressions are only evaluated as needed. This means that for left folds, we end up with unevaluated expressions (called ’thunks’) as large as the size of the list. This is in contrast with right folds, where the expansion can happen once and then evaluation be ’suspended’.

A key difference is right folds can work on infinite lists, whereas as left ones don’t. Mind blown.

The best explanation I found is a comment in a Github PR. This Stack Overflow post has a performance comparison between the two (spoiler: right fold is faster). This Haskell wiki expands a bit more on the eager version of folds.

Next question - what explains the coupling between the functional paradigm and an expressive type system?