RC W2D5 - Moving between extremes

There are times I’m stuck not knowing what to write about, and resign myself to the possibility that the post may end up being just a single paragraph that I forced myself to write. Then I start writing that paragraph and end up with a few paragraphs, and wonder how I resigned myself to writing just one.

Note to self: next time this happens, read up previous posts and see what comes up (especially since I tend to leave breadcrumbs around).

At times I feel going into Haskell is escapism. Let's stick to it for a week and see where this goes.

My plan for RC was to learn Haskell. I felt a little demotivated on Day 1 and resolved to try it out for at least a week (it's Week 2, I'm still at it). I wondered whether learning Haskell was my form of escapism, since I have a tendency of moving between extremes. Previously I went all out learning about performance. Now I go all out learning about correctness. I start thinking about the possibility of working on correctness. I realize correctness is hard too. Existential crisis on Day 1 of RC.

I quoted Matt Klein previously and I can see parallels here. Soft skills don’t come as naturally for me, so I decide to go hard into technical skills. Hard skills is hard too. It’s a good quote, so I’ll share it again here.

One last thing: don't let anyone tell you that the tech/engineering is the easy part. It's not. It's hard. Soft skills are also hard. It's ALL hard, and both are required to succeed.

It’s common that we swing between extremes. It’s a good thing that we honor the time we have in this world through dedication. However it does get complicated when that dedication itself becomes ingrained in one’s identity in such a way that it’s hard to take an honest look and wonder if we might be doing something for the wrong reasons.

OK not to play therapist here (perhaps a topic for a different day). I did recommend Stutz to a friend recently, and she asked me if it was relatable. I said to watch the first 10 minutes, and if it’s not your thing now but becomes a thing 10 years down the road and you think about watching it again, that 10 minutes would be worth its weight in gold.

RC W2D4 - Comparing performance of left vs right fold

Today I prepared slides for Friday’s presentation. I always find it helpful to a dry run the day before, I’m told magic happens when you sleep on it.

I was happy to draw a line connecting SICP to Haskell, and it’s a good periodic reminder when delving into functional programming. What's new is I feel pulled into a lot of theory (lambda calculus, type theory, category theory, proofs), discover this is normal, and spending time on all this doesn’t even cover what the compiler does in practice.

Re: compiler, I was previously under the impression that lazy evaluation can be inefficient because you’re doing the same calculation more than once. For example, in calculating `square (1 + 2)`, you have to do `(1 + 2)` twice in `((1 + 2) * (1 + 2))` instead of just once when do the sum first before you square. This is from CIS 194 lecture notes.

In particular, GHC uses a technique called graph reduction, where the expression being evaluated is actually represented as a graph, so that different parts of the expression can share pointers to the same subexpression.

My read is Haskell builds abstractions that doesn’t translate as naturally to what the processor does and has to do heavy lifting for performance reasons. The code still runs fast, but maybe you’ll need to know the compiler a bit better when you’re trying to reason about your program or diagnose bottlenecks. This theme came up in the FP study group, on the topic of additional 'bookkeeping' that Clojure does to ensure your dictionaries are immutable under the hood.

In any case I tried running locally the left fold vs right fold example here. This involves concatenating a 100mm integers and summing them up (I didn’t get a stack overflow like in the post, which is impressive heavy lifting!). This is for the right fold.
stack exec cis194-exe  2.27s user 0.85s system 116% cpu 2.672 total

Now the left.
stack exec cis194-exe  13.14s user 2.97s system 101% cpu 15.849 total

That's 6x! Reminder to self to read up more on the weak-head normal form (WHNF). My understanding so far is that left folds are more expensive here because you need to expand everything out, whereas in right folds you get to keep things in WHNF until the eval happens.

One of the best memories at work was about Python code my previous manager wrote in more functional style using `functools`. On discovering that it actually runs slower than writing it in idiomatic Python, another colleague said "It's harder to read, it's slower, so we're writing it this way for funsies?". Everyone laughed. My stomach hurt. It was amazing.

RC W2D3 - Payments and chargebacks

For non-technical talks, I presented on payments and chargebacks. In the spirit of the talk being non-technical, I left out the parts on the software stack and ML models (perhaps for a different post). The write-up below is roughly what I talked about.

When we think about a transaction, we think about a buyer making a payment in exchange for goods or services from the seller. To start off, let’s think about the payment in cash [1].

The cash model keeps things simple. However it does the require the buyer having cash on hand, and more importantly, the buyer may not yet have enough cash. This is where credit cards come in, making things convenient by solving the portability and credit issues.

With credit cards, we’re now looking at more parties to the transaction. There is the credit card issuer, often referred to as the issuing bank. The funds now need to go to the merchant’s bank, also known as the acquiring bank. Finally we need an intermediary to go between the two banks, and in the spirit of keeping things simple, we’ll think of it as a single party called the payments processor [2].

More parties means more fees. Let’s say 3% is taken out of the payment, so the seller gets 97 cents on every dollar. Most of that (roughtly two-thirds) goes to the issuing bank.

The convenience also comes at the cost of potential fraud. A fraudster with stolen credit card details may cash out by pretending to be a legitimate seller and 'cashes out' by processing stolen card details as a legitimate sale i.e. seller fraud. The fraudster may also pretend to be a legitimate buyer and purchase goods/services from a legitimate seller i.e. buyer fraud. Another possibility is by compromising the seller’s account and redirecting the funds to the fraudster’s bank account i.e. account takeover.

The payments processor may also be on the hook for a well-intentioned but poorly-run business by the seller. Imagine the seller selling concert tickets but unable to organize the concert. The buyers may reverse the transaction but the seller may be out of funds. Here the payments processor ‘fronts’ the payments to the seller, essentially taking on the credit risk.

The reversal request by the buyer is referred to as a chargeback. In fact often it's less of a reversal, but more of a 'mini court-case' called a dispute to decide which party should be on the hook. The issuing bank asks the buyer for more details, then the acquiring bank asks the seller for more details, and finally the determination be made (who decides often depends on the type of dispute).

For a finer breakdown of the different parties and the respective transaction fees, the Bloomberg article here has an excellent graphic. Modern Treasury has a great resource here. I enjoy reading Patrick McKenzie’s Bits about Money newsletter, and more recently listened to the Driven by Insight podcast interview with Alex Rampell. For chargebacks, Square has a Chargebacks 101 here.


[1] You can swap out the payment and/or the goods/services for the promise of payment/delivery, but there has to be a two-way exchange.

[2] This abstracts away the merchant services provider, payments gateway, etc

RC W2D2 - Lazy vs eager evaluation

In a previous post, I shared how my plan at RC was inspired by the courses I did with David Beazley. I wasn’t sure then if I was paraphrasing him accurately, but yesterday he wrote in his newsletter:

Meanwhile, I've managed to dig myself into a pretty deep rabbit hole as a result of the month-long courses I offered back in January. The extra time allowed me to make an attempt at writing the compiler project in Haskell, which to be honest, kind of blew my mind and exposed a totally different way of doing things in Python. I'll have more to say about that once I finish putting the pieces of my brain back together.

Learning a new language usually starts off a little painful but you feel a bit better once you start to string words together - this applies to both natural and programming languages. Haskell is not only in the functional paradigm but also has an expressive type system. It takes a bit getting used to, so far so good.

Chapter 4 of SICP involves writing a Scheme interpreter in Scheme, the so-called metacircular evaluator. This exercise also introduces the notion that you can make the language do what you want, for example make Scheme use lazy evaluation instead of eager (SICP uses the terms ‘applicative’ vs ’normal’ order, discussion here). I was intrigued. What else do you need to couple with lazy evaluation to make the language ‘work’?

From the CIS147 lecture notes.

Lazy evaluation makes it hard to reason about when things will be evaluated; hence including side effects in a lazy language would be extremely unintuitive. Historically, this is the reason Haskell is pure: initially, the designers of Haskell wanted to make a lazy functional language, and quickly realized it would be impossible unless it also disallowed side effects.

I assumed Haskell was intended to be pure from the start. TIL it’s a corollary of wanting the language be lazy.

An exercise in SICP involves representing map and filter operations as special cases of reduce. In fact SICP uses the term ‘accumulate’ for reduce, while Haskell uses ‘fold’ - we’ll use the latter term from here onwards. What I discovered in Haskell that I didn’t cover in SICP is the practical implications of left fold (`foldl`) vs right fold (`foldr`).

First, a more illustrative explanation. Fold lets you apply an operation successively over a list, here we sum up values in a list using left fold.
foldl (+) 0 [1, 2, 3]
= foldl (+) (0 + 1) [2,3]
= foldl (+) ((0 + 1) + 2) [3]
= foldl (+) (((0 + 1) + 2) + 3) []
= (((0 + 1) + 2) + 3)
= ((1 + 2) + 3)
= (3 + 3)
= 6

This is in contrast with right folds.

foldr (+) 0 [1, 2, 3]
= foldr (+) (3 + 0) [1, 2]
= foldr (+) (2 + (3 + 0)) [1]
= foldr (+) (1 + (2 + (3 + 0))) []
= (1 + (2 + (3 + 0)))
= (1 + (2 + 3))
= (1 + 5)
= 6

In an eager language, left folds are memory-efficient because we can use constant space to do each successive evaluation i.e. `0+1`, `1+2`, `3+3` etc; this is usually referred to as the tail call optimization. In right folds, we need to build up each successive operation in a stack frame.

In a lazy language, however, expressions are only evaluated as needed. This means that for left folds, we end up with unevaluated expressions (called ’thunks’) as large as the size of the list. This is in contrast with right folds, where the expansion can happen once and then evaluation be ’suspended’.

A key difference is right folds can work on infinite lists, whereas as left ones don’t. Mind blown.

The best explanation I found is a comment in a Github PR. This Stack Overflow post has a performance comparison between the two (spoiler: right fold is faster). This Haskell wiki expands a bit more on the eager version of folds.

Next question - what explains the coupling between the functional paradigm and an expressive type system?

RC W2D1 - Sequels are hard

In my previous post, I shared snippets from my RC application and how my plans have changed over the first week. This was inspired by Cindy Wu sharing her own RC application, amusingly when she had trouble writing about RC.

I share this sentiment. My Week 1 recap from my first batch in Fall 2020:

What can I say? Week 1 of RC has been a blast! It’s affirming seeing the diversity of backgrounds and knowledge. As a self-taught programmer, this week helped illuminate how the path towards mastery is not linear. It’s multi-dimensional.

The prior recap, however, also had a quote by Paulo Coelho, a recording of Glenn Gould’s Goldberg Variations, the Jiro Dreams of Sushi trailer, and an FT Weekend article on the École Nationale d'Administration moving from Paris to Alsace (now permanently closed). These were snippets collected over the years, waiting to be shared with the world.

It’s true what Naval Ravikant says about sequels.

Maybe the first run, they took an idea they’ve had for a decade or three decades and they expounded on it and it sounded great. And then in the sequel, whether it’s a book or a movie, they have to come up with something equally good, but they only have two years to do it. And it’s just really difficult to do.

RC W1D5 - Why I'm learning Haskell at RC

I had been wondering what to write about to recap Week 1 of RC. I came across a post by Cindy Wu sharing her RC application, which inspired me to review my own. My plan at RC has narrowed a fair bit. To the question "What would you like to work on at RC?”, I described two ideas.

The first is to work through the Haskell, Prolog and Idris sections of Seven Languages in Seven Weeks. I love seeing the trade-offs different languages make, and keen to extend functional programming best practices to imperative code. This would also be an extension of the SICP course I completed recently. A lingering wonder I have is how concepts like composition help frame abstractions for domain-specific software - what's the reason Cognitech joined Nubank and Jane St love OCaml?

The second is to build a toy ChatGPT clone. I've build [sic] machine learning and neural network models from scratch before (and presented on them at RC and at my workplace), but generative models are more recent and a lot more resource-intensive. My plan is to spend about two weeks to get a basic understanding how they work, and either build one from scratch or use an existing model and 'retrain' one to write in my writing style say.

Re: first idea, I had a quick look around Idris resources and my impression was I would get a lot more out of Idris if I had spent more time on Haskell (or at least, more than 1 week). I’m also thinking about functional programming and types as a focus; Prolog is logic programming so on hold for now.

Re: second idea, it's on hold. There’s a lot of interest in the fast.ai course (which is a lot more comprehensive), not so much just going through Andrej Karpathy videos. To the question "Are there things you would do differently if you came back?”, I feel focusing on the first idea also lets me do the following better:

I would be more deliberate in listening to what my batch mates are interested in, and use that to choose on things to work on based on what I can collaborate with and pair on.

Now circling back, why Haskell?

I had a manager once who enjoyed functional programming so much, he named his son Haskell (yes it’s an actual name, and on hearing this got me more excited about the company). My previous RC batch had an active study group on functional programming, so there’s also a bit of catching up on the fun I missed.

More recently, I did David Beazley’s course on SICP (details here, write up here). Learning the different trade-offs that different languages made definitely helped me feel more confident in my main language Python (which is imperative). I followed up the SICP course with compilers. Here Dave implemented a compiler in Python and Haskell, and described how the parser combinator he built for the latter blew his mind.

If someone with that level of Python mastery can learn something new from Haskell, maybe I can too (on mastery, here is Dave live-coding a WebAssembly interpreter on stage).

I admit the things I’ve been excited in has moved around a lot: machine learning, neural networks, build and deploy systems (Bazel, Docker, Kubernetes), WebAssembly, Go, now functional programming. I look at PhDs and resident expert at VCs and go “What will I be an expert in?”.

Maybe I won’t find one. Maybe I’ll continue cycling between consuming everything I can on a topic and burning out. Maybe what I learn doesn’t compound that much, or perhaps, as naturally.

Then again, maybe it’s better to have ups and downs rather than be passive. In the words of my significant other:

There's something lethargic at best and dangerous at worst about idleness.

RC W1D4 - Domain-specific software (sketch)

OK we’re back on to 'sharing sketches of my blog post ideas along the way’, here we go.

When I think about CS topics, I think about operating systems, databases, networking etc. These are domain-agnostic. I began to ponder, are there specific aspects of a language, framework or paradigm that’s particularly useful for working in a specific domain (finance, healthcare, education etc)? What choices would you make based on what you’re optimizing for? Is domain-specific software a thing?

SICP touches on this at the start of Chapter 3:

One powerful design strategy, which is particularly appropriate to the construction of programs for modeling physical systems, is to base the structure of our programs on the structure of the system being modeled.

This preface led to a discussion of mutability and functional vs imperative models. A related idea is the argument that the functional model allows us to focus on the domain, and not the step-by-step execution.

I had a coffee chat with Jacquin Mininger which led to a number of threads and resources, as sketched out below. I’ll likely finesse these once I’ve done a bit more homework. Mistakes my own.

Types - With strong types, we’re able to constrain the space of illegal states (since we can’t use a function that operates on strings to work on integers, say). With static types, we are able to run these checks at compile time. With an expressive type system, you can even say represent a EUR/USD pair with its own type instead of a float, so it’s not possible to mistake this for a GBP/USD pair (I believe this example is from here, which I need to re-read).

Pure functions - When you don’t have side effects, your code is easier to understand because its impact is isolated. It also makes refactoring easier. A thought that came up here is Jane St being a fan of OCaml. While the functional aspect is a plus, it's easy to overlook how it’s an even bigger plus when it’s both the functional and types working together to maximize how much the domain can be ‘expressed' in code.

Proofs - These provide an even stronger check. Agda and Coq are popular choices, but can be slow. Haskell is faster. Rust is even faster and has a much larger supporting ecosystem.

Math - There are some domains like crypto where the desired behavior is expressed in math, which also translates more naturally into Haskel.

Performance - It’s easier to reason about performance when using imperative languages because it’s closer to what the processor is doing. This means that it’s harder to do or less natural when using a functional language. It’s also easier to reason about performance when you don’t have to worry about the garbage collector ’stopping the world’; here Rust having no runtime can fare better vs Go.

Haskell - Related to the performance point above, this also means that you’ll need to know the compiler / GHC well to figure out where your bottlenecks are when writing Haskell.

Finance and crypto - The 'return on investment’ of using functional languages may be higher here given the emphasis on correctness. While this may weigh the choice towards correctness over performance, there are subdomains like high-frequency trading where you shouldn't be in business if you don't have performance.

Other domains - I have very little context if domain-specific software is a thing outside of finance and crypto, naturally would love to hear thoughts. I only also have the one example of Nubank acquiring Cognitech / Clojure, which anecdotally stresses the finance example.

RC W1D4 - Thinking long-term

In my previous post, I shared thoughts on building patience. A discussion followed on RC chat which highlighted two aspects of patience.

The first relates to ‘patience to deal with finicky technical things’. I feel I’ve gotten better at this over time. In earlier days I would Stack Overflow just enough to get past the problem I was facing. More recently I try to be more deliberate on spending more time to read documentation, and where necessary, the source code. Ben Kuhn says this best in his post.

The second, which is what I had in mind, relates to ‘patience in dealing with other people and organizational issues’. I used to (and in weaker moments still do) focus on getting OKR’s done fast and aim to be the ‘10x engineer’ on the team. Now I try to be more deliberate about longer-term issues surrounding the work. How does the work relate to my team’s goals? Stakeholder teams goals? The org's goals? What does the company product roadmap look like and how does the work fit in in the bigger picture? A few months back I wrote a design doc that other engineers executed on, which I hope is first of many.

Another way to think about the latter form is, if you’re hired to build out a team to achieve company objectives, how would you go about doing it? This requires a mental shift from thinking what you would do in a quarter, to how to prioritize and execute and hire over multiple years. This framing also helps you empathize with the challenges your manager is facing.

RC W1D3 - Building patience

My recent posts have been more reflective compared to my first time at RC. I suppose it’s in line with this goal that I set out:

I would be less of a perfectionist in sharing sketches of my blog post ideas along the way, instead of mostly finished piece towards the end.

In re-applying to RC, I had a look at my Github history and noticed that 2024 would be 10 years since my first pull request. Perhaps this is why, despite it being the first week of RC, I’ve been feeling a little low.

In the past year, I've also realized my main hurdle has actually been soft skills. I’ve been working on building patience. With patience you can make better long-term plans and build relationships that compound. It helps you be the ‘responsible adult’. Cue Matt Klein.

One last thing: don't let anyone tell you that the tech/engineering is the easy part. It's not. It's hard. Soft skills are also hard. It's ALL hard, and both are required to succeed.

RC W1D2 - Reflections on winning

In my previous post, I wrote:

Partly due to having worked on infrastructure for the past 2 years and wondering, is there more to life than making things run at scale?

Lots of people would love to work on making things run at scale. The point I’m trying to make is, I got into it for the wrong reasons.

My first role in software involved being better at statistics than any software engineer and better at software engineering than any statistician. I was self-taught, and tried to get over not having a CS degree by doing lots of courses.

I pursued making things run at scale because that’s what everyone else wanted to do. It’s the type of work that gets you on the front page of Hacker News.

Ironically this mirrored my first job out of college. I remember reading a post by Nick Chirls describing his journey, and more recently, Alex Rampell.

If you're in a competitive thing, “I want to win” in high school, you want to go to the best college. Part of winning in college is you want to get the best job… If you're 22 and working at McKinsey or Goldman Sachs and the pay isn't even that good relative to other things that you can potentially be doing with your time if you know how to optimize it, it has the imprimatur of “Wow, you are a very successful person!” and that's the hardest job to get after graduating from a school like Harvard.

What’s the plan now? I’m not sure yet. I do know it involves a lot of reflection and talking to lots and lots of people. It involves being a better programmer at RC.