RC W1D4 - Thinking long-term

In my previous post, I shared thoughts on building patience. A discussion followed on RC chat which highlighted two aspects of patience.

The first relates to ‘patience to deal with finicky technical things’. I feel I’ve gotten better at this over time. In earlier days I would Stack Overflow just enough to get past the problem I was facing. More recently I try to be more deliberate on spending more time to read documentation, and where necessary, the source code. Ben Kuhn says this best in his post.

The second, which is what I had in mind, relates to ‘patience in dealing with other people and organizational issues’. I used to (and in weaker moments still do) focus on getting OKR’s done fast and aim to be the ‘10x engineer’ on the team. Now I try to be more deliberate about longer-term issues surrounding the work. How does the work relate to my team’s goals? Stakeholder teams goals? The org's goals? What does the company product roadmap look like and how does the work fit in in the bigger picture? A few months back I wrote a design doc that other engineers executed on, which I hope is first of many.

Another way to think about the latter form is, if you’re hired to build out a team to achieve company objectives, how would you go about doing it? This requires a mental shift from thinking what you would do in a quarter, to how to prioritize and execute and hire over multiple years. This framing also helps you empathize with the challenges your manager is facing.

RC W1D4 - Domain-specific software (sketch)

OK we’re back on to 'sharing sketches of my blog post ideas along the way’, here we go.

When I think about CS topics, I think about operating systems, databases, networking etc. These are domain-agnostic. I began to ponder, are there specific aspects of a language, framework or paradigm that’s particularly useful for working in a specific domain (finance, healthcare, education etc)? What choices would you make based on what you’re optimizing for? Is domain-specific software a thing?

SICP touches on this at the start of Chapter 3:

One powerful design strategy, which is particularly appropriate to the construction of programs for modeling physical systems, is to base the structure of our programs on the structure of the system being modeled.

This preface led to a discussion of mutability and functional vs imperative models. A related idea is the argument that the functional model allows us to focus on the domain, and not the step-by-step execution.

I had a coffee chat with Jacquin Mininger which led to a number of threads and resources, as sketched out below. I’ll likely finesse these once I’ve done a bit more homework. Mistakes my own.

Types - With strong types, we’re able to constrain the space of illegal states (since we can’t use a function that operates on strings to work on integers, say). With static types, we are able to run these checks at compile time. With an expressive type system, you can even say represent a EUR/USD pair with its own type instead of a float, so it’s not possible to mistake this for a GBP/USD pair (I believe this example is from here, which I need to re-read).

Pure functions - When you don’t have side effects, your code is easier to understand because its impact is isolated. It also makes refactoring easier. A thought that came up here is Jane St being a fan of OCaml. While the functional aspect is a plus, it's easy to overlook how it’s an even bigger plus when it’s both the functional and types working together to maximize how much the domain can be ‘expressed' in code.

Proofs - These provide an even stronger check. Agda and Coq are popular choices, but can be slow. Haskell is faster. Rust is even faster and has a much larger supporting ecosystem.

Math - There are some domains like crypto where the desired behavior is expressed in math, which also translates more naturally into Haskel.

Performance - It’s easier to reason about performance when using imperative languages because it’s closer to what the processor is doing. This means that it’s harder to do or less natural when using a functional language. It’s also easier to reason about performance when you don’t have to worry about the garbage collector ’stopping the world’; here Rust having no runtime can fare better vs Go.

Haskell - Related to the performance point above, this also means that you’ll need to know the compiler / GHC well to figure out where your bottlenecks are when writing Haskell.

Finance and crypto - The 'return on investment’ of using functional languages may be higher here given the emphasis on correctness. While this may weigh the choice towards correctness over performance, there are subdomains like high-frequency trading where you shouldn't be in business if you don't have performance.

Other domains - I have very little context if domain-specific software is a thing outside of finance and crypto, naturally would love to hear thoughts. I only also have the one example of Nubank acquiring Cognitech / Clojure, which anecdotally stresses the finance example.

RC W1D5 - Why I'm learning Haskell at RC

I had been wondering what to write about to recap Week 1 of RC. I came across a post by Cindy Wu sharing her RC application, which inspired me to review my own. My plan at RC has narrowed a fair bit. To the question "What would you like to work on at RC?”, I described two ideas.

The first is to work through the Haskell, Prolog and Idris sections of Seven Languages in Seven Weeks. I love seeing the trade-offs different languages make, and keen to extend functional programming best practices to imperative code. This would also be an extension of the SICP course I completed recently. A lingering wonder I have is how concepts like composition help frame abstractions for domain-specific software - what's the reason Cognitech joined Nubank and Jane St love OCaml?

The second is to build a toy ChatGPT clone. I've build [sic] machine learning and neural network models from scratch before (and presented on them at RC and at my workplace), but generative models are more recent and a lot more resource-intensive. My plan is to spend about two weeks to get a basic understanding how they work, and either build one from scratch or use an existing model and 'retrain' one to write in my writing style say.

Re: first idea, I had a quick look around Idris resources and my impression was I would get a lot more out of Idris if I had spent more time on Haskell (or at least, more than 1 week). I’m also thinking about functional programming and types as a focus; Prolog is logic programming so on hold for now.

Re: second idea, it's on hold. There’s a lot of interest in the fast.ai course (which is a lot more comprehensive), not so much just going through Andrej Karpathy videos. To the question "Are there things you would do differently if you came back?”, I feel focusing on the first idea also lets me do the following better:

I would be more deliberate in listening to what my batch mates are interested in, and use that to choose on things to work on based on what I can collaborate with and pair on.

Now circling back, why Haskell?

I had a manager once who enjoyed functional programming so much, he named his son Haskell (yes it’s an actual name, and on hearing this got me more excited about the company). My previous RC batch had an active study group on functional programming, so there’s also a bit of catching up on the fun I missed.

More recently, I did David Beazley’s course on SICP (details here, write up here). Learning the different trade-offs that different languages made definitely helped me feel more confident in my main language Python (which is imperative). I followed up the SICP course with compilers. Here Dave implemented a compiler in Python and Haskell, and described how the parser combinator he built for the latter blew his mind.

If someone with that level of Python mastery can learn something new from Haskell, maybe I can too (on mastery, here is Dave live-coding a WebAssembly interpreter on stage).

I admit the things I’ve been excited in has moved around a lot: machine learning, neural networks, build and deploy systems (Bazel, Docker, Kubernetes), WebAssembly, Go, now functional programming. I look at PhDs and resident expert at VCs and go “What will I be an expert in?”.

Maybe I won’t find one. Maybe I’ll continue cycling between consuming everything I can on a topic and burning out. Maybe what I learn doesn’t compound that much, or perhaps, as naturally.

Then again, maybe it’s better to have ups and downs rather than be passive. In the words of my significant other:

There's something lethargic at best and dangerous at worst about idleness.

RC W2D1 - Sequels are hard

In my previous post, I shared snippets from my RC application and how my plans have changed over the first week. This was inspired by Cindy Wu sharing her own RC application, amusingly when she had trouble writing about RC.

I share this sentiment. My Week 1 recap from my first batch in Fall 2020:

What can I say? Week 1 of RC has been a blast! It’s affirming seeing the diversity of backgrounds and knowledge. As a self-taught programmer, this week helped illuminate how the path towards mastery is not linear. It’s multi-dimensional.

The prior recap, however, also had a quote by Paulo Coelho, a recording of Glenn Gould’s Goldberg Variations, the Jiro Dreams of Sushi trailer, and an FT Weekend article on the École Nationale d'Administration moving from Paris to Alsace (now permanently closed). These were snippets collected over the years, waiting to be shared with the world.

It’s true what Naval Ravikant says about sequels.

Maybe the first run, they took an idea they’ve had for a decade or three decades and they expounded on it and it sounded great. And then in the sequel, whether it’s a book or a movie, they have to come up with something equally good, but they only have two years to do it. And it’s just really difficult to do.

RC W2D2 - Lazy vs eager evaluation

In a previous post, I shared how my plan at RC was inspired by the courses I did with David Beazley. I wasn’t sure then if I was paraphrasing him accurately, but yesterday he wrote in his newsletter:

Meanwhile, I've managed to dig myself into a pretty deep rabbit hole as a result of the month-long courses I offered back in January. The extra time allowed me to make an attempt at writing the compiler project in Haskell, which to be honest, kind of blew my mind and exposed a totally different way of doing things in Python. I'll have more to say about that once I finish putting the pieces of my brain back together.

Learning a new language usually starts off a little painful but you feel a bit better once you start to string words together - this applies to both natural and programming languages. Haskell is not only in the functional paradigm but also has an expressive type system. It takes a bit getting used to, so far so good.

Chapter 4 of SICP involves writing a Scheme interpreter in Scheme, the so-called metacircular evaluator. This exercise also introduces the notion that you can make the language do what you want, for example make Scheme use lazy evaluation instead of eager (SICP uses the terms ‘applicative’ vs ’normal’ order, discussion here). I was intrigued. What else do you need to couple with lazy evaluation to make the language ‘work’?

From the CIS147 lecture notes.

Lazy evaluation makes it hard to reason about when things will be evaluated; hence including side effects in a lazy language would be extremely unintuitive. Historically, this is the reason Haskell is pure: initially, the designers of Haskell wanted to make a lazy functional language, and quickly realized it would be impossible unless it also disallowed side effects.

I assumed Haskell was intended to be pure from the start. TIL it’s a corollary of wanting the language be lazy.

An exercise in SICP involves representing map and filter operations as special cases of reduce. In fact SICP uses the term ‘accumulate’ for reduce, while Haskell uses ‘fold’ - we’ll use the latter term from here onwards. What I discovered in Haskell that I didn’t cover in SICP is the practical implications of left fold (`foldl`) vs right fold (`foldr`).

First, a more illustrative explanation. Fold lets you apply an operation successively over a list, here we sum up values in a list using left fold.
foldl (+) 0 [1, 2, 3]
= foldl (+) (0 + 1) [2,3]
= foldl (+) ((0 + 1) + 2) [3]
= foldl (+) (((0 + 1) + 2) + 3) []
= (((0 + 1) + 2) + 3)
= ((1 + 2) + 3)
= (3 + 3)
= 6

This is in contrast with right folds.

foldr (+) 0 [1, 2, 3]
= foldr (+) (3 + 0) [1, 2]
= foldr (+) (2 + (3 + 0)) [1]
= foldr (+) (1 + (2 + (3 + 0))) []
= (1 + (2 + (3 + 0)))
= (1 + (2 + 3))
= (1 + 5)
= 6

In an eager language, left folds are memory-efficient because we can use constant space to do each successive evaluation i.e. `0+1`, `1+2`, `3+3` etc; this is usually referred to as the tail call optimization. In right folds, we need to build up each successive operation in a stack frame.

In a lazy language, however, expressions are only evaluated as needed. This means that for left folds, we end up with unevaluated expressions (called ’thunks’) as large as the size of the list. This is in contrast with right folds, where the expansion can happen once and then evaluation be ’suspended’.

A key difference is right folds can work on infinite lists, whereas as left ones don’t. Mind blown.

The best explanation I found is a comment in a Github PR. This Stack Overflow post has a performance comparison between the two (spoiler: right fold is faster). This Haskell wiki expands a bit more on the eager version of folds.

Next question - what explains the coupling between the functional paradigm and an expressive type system?

RC W2D3 - Payments and chargebacks

For non-technical talks, I presented on payments and chargebacks. In the spirit of the talk being non-technical, I left out the parts on the software stack and ML models (perhaps for a different post). The write-up below is roughly what I talked about.

When we think about a transaction, we think about a buyer making a payment in exchange for goods or services from the seller. To start off, let’s think about the payment in cash [1].

The cash model keeps things simple. However it does the require the buyer having cash on hand, and more importantly, the buyer may not yet have enough cash. This is where credit cards come in, making things convenient by solving the portability and credit issues.

With credit cards, we’re now looking at more parties to the transaction. There is the credit card issuer, often referred to as the issuing bank. The funds now need to go to the merchant’s bank, also known as the acquiring bank. Finally we need an intermediary to go between the two banks, and in the spirit of keeping things simple, we’ll think of it as a single party called the payments processor [2].

More parties means more fees. Let’s say 3% is taken out of the payment, so the seller gets 97 cents on every dollar. Most of that (roughtly two-thirds) goes to the issuing bank.

The convenience also comes at the cost of potential fraud. A fraudster with stolen credit card details may cash out by pretending to be a legitimate seller and 'cashes out' by processing stolen card details as a legitimate sale i.e. seller fraud. The fraudster may also pretend to be a legitimate buyer and purchase goods/services from a legitimate seller i.e. buyer fraud. Another possibility is by compromising the seller’s account and redirecting the funds to the fraudster’s bank account i.e. account takeover.

The payments processor may also be on the hook for a well-intentioned but poorly-run business by the seller. Imagine the seller selling concert tickets but unable to organize the concert. The buyers may reverse the transaction but the seller may be out of funds. Here the payments processor ‘fronts’ the payments to the seller, essentially taking on the credit risk.

The reversal request by the buyer is referred to as a chargeback. In fact often it's less of a reversal, but more of a 'mini court-case' called a dispute to decide which party should be on the hook. The issuing bank asks the buyer for more details, then the acquiring bank asks the seller for more details, and finally the determination be made (who decides often depends on the type of dispute).

For a finer breakdown of the different parties and the respective transaction fees, the Bloomberg article here has an excellent graphic. Modern Treasury has a great resource here. I enjoy reading Patrick McKenzie’s Bits about Money newsletter, and more recently listened to the Driven by Insight podcast interview with Alex Rampell. For chargebacks, Square has a Chargebacks 101 here.


[1] You can swap out the payment and/or the goods/services for the promise of payment/delivery, but there has to be a two-way exchange.

[2] This abstracts away the merchant services provider, payments gateway, etc

RC W2D4 - Comparing performance of left vs right fold

Today I prepared slides for Friday’s presentation. I always find it helpful to a dry run the day before, I’m told magic happens when you sleep on it.

I was happy to draw a line connecting SICP to Haskell, and it’s a good periodic reminder when delving into functional programming. What's new is I feel pulled into a lot of theory (lambda calculus, type theory, category theory, proofs), discover this is normal, and spending time on all this doesn’t even cover what the compiler does in practice.

Re: compiler, I was previously under the impression that lazy evaluation can be inefficient because you’re doing the same calculation more than once. For example, in calculating `square (1 + 2)`, you have to do `(1 + 2)` twice in `((1 + 2) * (1 + 2))` instead of just once when do the sum first before you square. This is from CIS 194 lecture notes.

In particular, GHC uses a technique called graph reduction, where the expression being evaluated is actually represented as a graph, so that different parts of the expression can share pointers to the same subexpression.

My read is Haskell builds abstractions that doesn’t translate as naturally to what the processor does and has to do heavy lifting for performance reasons. The code still runs fast, but maybe you’ll need to know the compiler a bit better when you’re trying to reason about your program or diagnose bottlenecks. This theme came up in the FP study group, on the topic of additional 'bookkeeping' that Clojure does to ensure your dictionaries are immutable under the hood.

In any case I tried running locally the left fold vs right fold example here. This involves concatenating a 100mm integers and summing them up (I didn’t get a stack overflow like in the post, which is impressive heavy lifting!). This is for the right fold.
stack exec cis194-exe  2.27s user 0.85s system 116% cpu 2.672 total

Now the left.
stack exec cis194-exe  13.14s user 2.97s system 101% cpu 15.849 total

That's 6x! Reminder to self to read up more on the weak-head normal form (WHNF). My understanding so far is that left folds are more expensive here because you need to expand everything out, whereas in right folds you get to keep things in WHNF until the eval happens.

One of the best memories at work was about Python code my previous manager wrote in more functional style using `functools`. On discovering that it actually runs slower than writing it in idiomatic Python, another colleague said "It's harder to read, it's slower, so we're writing it this way for funsies?". Everyone laughed. My stomach hurt. It was amazing.

RC W2D5 - Moving between extremes

There are times I’m stuck not knowing what to write about, and resign myself to the possibility that the post may end up being just a single paragraph that I forced myself to write. Then I start writing that paragraph and end up with a few paragraphs, and wonder how I resigned myself to writing just one.

Note to self: next time this happens, read up previous posts and see what comes up (especially since I tend to leave breadcrumbs around).

At times I feel going into Haskell is escapism. Let's stick to it for a week and see where this goes.

My plan for RC was to learn Haskell. I felt a little demotivated on Day 1 and resolved to try it out for at least a week (it's Week 2, I'm still at it). I wondered whether learning Haskell was my form of escapism, since I have a tendency of moving between extremes. Previously I went all out learning about performance. Now I go all out learning about correctness. I start thinking about the possibility of working on correctness. I realize correctness is hard too. Existential crisis on Day 1 of RC.

I quoted Matt Klein previously and I can see parallels here. Soft skills don’t come as naturally for me, so I decide to go hard into technical skills. Hard skills is hard too. It’s a good quote, so I’ll share it again here.

One last thing: don't let anyone tell you that the tech/engineering is the easy part. It's not. It's hard. Soft skills are also hard. It's ALL hard, and both are required to succeed.

It’s common that we swing between extremes. It’s a good thing that we honor the time we have in this world through dedication. However it does get complicated when that dedication itself becomes ingrained in one’s identity in such a way that it’s hard to take an honest look and wonder if we might be doing something for the wrong reasons.

OK not to play therapist here (perhaps a topic for a different day). I did recommend Stutz to a friend recently, and she asked me if it was relatable. I said to watch the first 10 minutes, and if it’s not your thing now but becomes a thing 10 years down the road and you think about watching it again, that 10 minutes would be worth its weight in gold.

RC W3D1 - The labors of the functional programming enthusiast

To those who maintain a blog and wonder if you’re putting in too much effort relative to the value you get out of it: (1) Let’s chat (2) The person who finds this blog most helpful is myself.

It’s the start of Week 3, thankfully feeling a bit more centered. I had been wondering if I was learning Haskell for the wrong reasons, did a presentation last Friday, and felt reassured in my decision when I was able to draw line linking learning SICP in December to learning Haskell now.

In yesterday’s post, I reflected on doing things for the wrong reason. I suppose it’s common to do things in extremes, hence the stereotype of the investment banker ditching the high life to get into spirituality. My version of this is getting into functional programming after spending time working on infra (though to be fair, I didn’t do it for that long).

I imagine I’m also not alone in wanting to learn one thing and then discover whole new worlds. The starting point was SICP into Haskell. Then there was lambda calculus, type theory, category theory. Maybe proofs. Maybe other functional languages. Maybe implementation / compilers.

In a nutshell, way too much. Too much for a half batch, too much for a full batch. I’m not sure my significant other would love the idea of a year-long sabbatical.

What is it all for anyway? How does it make me a better programmer?

I’m one third into the half batch. I feel compelled to share ’the story so far’.

The question ‘what is functional programming?’ came up in FP study group last week. I like to think of the broader answer around 'functions as first-citizens', namely that you can pass functions as arguments and functions can return other functions (often referred to as closures). This way lots of other imperative languages like Python say can be written in a functional way.

Then there’s functional in the sense of ‘functional vs imperative’. In this context, functions are pure (no side effects) and variables are immutable. Thus calling the same function with the same arguments results in the same output every time.

Now I imagine lots of people wanting to get into functional programming pick up Haskell, but Haskell is additionally a lazy language. This means that evaluation is delayed until results are actually needed, whereas most functional languages are eager. As per Brent Yorgey, It’s impractical and unexpected for a lazy language to not be pure.

We next get to the question of types. Why the tight coupling between functional languages and types? I don’t have a good answer.

Some suggest it’s a practical consideration, but I’m less convinced given the examples of most lisps (functional but dynamically typed) and Rust (imperative but has algebraic data types). I lean more towards the explanation of the language designers being ‘people who like abstract ideas and want to implement abstract ideas’; there’s a tradition of the languages being academic or experimental before becoming ‘industrial strength’. Examples include ML becoming OCaml, and Haskell itself.

I got to the chapter on functors in UPenn's CIS 194, and thought they looked familiar. I had a little digression into Bartosz Milewski’s category theory videos (highly recommend the intro, his enthusiasm is infectious). Functors appeared to me as ’this abstract idea is cool, let’s see what it looks like implemented in the language’.

OK so now we’re in real deep. The intro video talked about the Curry-Howard correspondence, which basically says there is a direct relationship between computer programs (namely the typed lambda calculus) and mathematical proofs. This was extended to the Curry-Howard-Lambek correspondence, that extended the equivalence (or isomorphism) to categories as well.

"Stop, stop, I’m lost!”, you say. I hear you, so am I.

Where do I go from here? I plan to finish up Haskell via CIS 194 (surely you can’t skip monads). I just got a copy of The Little Typer, maybe get Types and Programming Languages in post-batch (Stanford's CS 242 looks pretty cool, even has a paper on the teaching philosophy). RC has Impossible Projects Day on Thursday, here’s my excuse to jump head-long first and see where the tides take me.

RC W3D2 - The end of the speed-run

Different people have different learning styles. The style that I find works well is to go through the material end-to-end in multiple iterations. The first pass usually as fast as I can for the high-level overview, then hone in on specific sections. Perhaps this is why learning SICP with David Beazley was pure joy.

I’m at functors in Haskell and I can’t pattern-match as easily.

Time to slow down. Time to reflect. Time to do the other things I mentioned I’ll do at RC: chat more, pair more, write more.