Experiments in managing perfectionism

W3D4 - The travails of the blog post writer

I read my own posts over the past 2 days and I’m disappointed. No wit, no style, no panache. Such is the reality of forcing yourself to write.

I often start my day responding to posts and sharing resources where relevant. I’ll make a point to add a section below to highlight these.

The notable event today was attending Modal Labs’ meetup. Modal is building a tool to cut down the wait on each iteration loop when deploying to the cloud. Writing code is more fun on dev because you get these fast feedback cycles, and there’s always a bit of ceremony with setting up AWS, configs etc. Modal makes this easy by allowing you run code in the cloud by simply adding a decorator, which is pretty cool.

I’ve been making a point to attend events as a way to ‘widen the aperture’ in my job search. In the past I’ve gone through friends, ex-colleagues and recruiters, but now I’m curious to see what else is out there. I may end up in that direction; at least now I can say I took the chance to explore a bit more. Plus a lot of what’s being developed with generative AI is so new, it’s much easier to know what’s getting traction when you get to see how excited people are talking about it.

I had discovered Full Stack LLM bootcamp before through Andrej Karpathy tweeting about it, yesterday I got to meet someone who created the content. I know, I already have other projects in mind but perhaps it’ll be good homework before the next meetup by Weights + Biases.

I’m only 30 mins in into David Perell’s interview on How Do You Use ChatGPT but I am blown away. Highly recommend watching the first 5 minutes to see if it’s your thing. What’s so cool about it? I’m very impressed how deliberate he is at learning about himself and better understand what he can be world-class at, and I was amazed to see how he uses ChatGPT in that process.

I’m planning to restart the RC event on the Staff Engineer roundtable next week! I’ll end this post with a quote from Cindy Wu.

also one thing my engineering coach said to me that stuck with me that in every situation you have at least 3 options 1/ do nothing 2/ acceptance 3/ direct action. for a long time i didn't see a difference between 1/ and 2/ but now i see that they are very different.

i think when i do 1/ i stay grumpy if i am upset about the situation. BUT if i do 2/ then i should feel more at peace with whatever mediocrity or imperfections i see. hope to be able to hone this muscle that helps me find acceptance in the things that we choose not to affect.

All the resources I consumed

All the resources I shared

W3D3 - Focusing on people skills

I wrote yesterday how I didn’t have a project to work on had I did Recurse Center this time around between roles. I realized that it’s only half true. Yes I need to sharpen my programming skills for interviews. However I do have something that I’ve been focusing on: people skills.

What’s different here is (1) it’s not directly related to software, and (2) it takes time. It takes time to come up with the questions to ask, to read up materials, to reflect on the ideas that surface. It’s not “my Raft implementation is working!”. There’s no start and finish. It’s an ongoing process.

It’s easy to fall into the trap of feeling stuck when looking for jobs. I’d like to take the time to explore more this time. Late last year I went to my first networking event since 2015. I look forward to more. I expect to feel out of place, but hey, that’s a chance to grow out of my comfort zone. It’s an opportunity to see what else is out there, what one could serendipitously discover.

In the past I’ve also gotten into the habit of focusing on a single project. I’m going to try having multiple small projects in parallel. Yes, some of them will involve the front end. In fact I played around with Modal as meetup prep to discover I can use it to run Whisper in the cloud (transcribing an hour-long audio clip takes 8 hours locally).

I didn’t complete Advent of Code Day 12. What’s interesting is that I write code that gives a closer representation of the problem, rather than a simplification that helps get the answer. For example, I would generate all permutations of a string, instead of counting how many there are. Is this brute force? Sort of, though even after optimizing string permutation it's still too slow. I suppose it’s slower to handle 1 million strings instead of a counter at 1 million.

W3D2 - Why you should write blog posts

I try to write blog posts the next day, but I’m writing this on the next next day because I’m stuck on Advent of Code Day 12. In fact I believe this is the first day that I didn’t complete both parts of the puzzle on the day itself.

I started reading interviewing.io posts on the job market (on 2024 and #opentwork) and felt demoralized.

Next I read my blog posts from my most recent batch at Recurse Center (yes this is why you should blog / write - for yourself to look back on). I wondered why I didn’t do RC this time and realized I can’t think of a specific project I want to spend time being heads down on. Reading through past posts, I felt much better coming across this post with the following quote.

One of my favorite phrases picked up in the past two years is the Shangaan phrase Hi Ta Xi Uma, which I learned from Renias Mhlongo, one of the top trackers in Africa. It means “we will find it,” and Renias will keep muttering it when he loses his track and struggles to find the next one. Everything is hard, and usually much harder than we can fathom. All the best people I’ve met through the podcast just don’t let that stop them. They also seem to develop an awareness of this constant difficulty and just become used to it.

We will find it.

W3D1 - Make feedback loops fast

I started doing a bit more background reading for an event I'll be heading to later in the week, organized by Modal. Erik Bernhardsson founded Modal, and while I absolutely love his post on building data teams, the quote is from a post about Modal itself.

I spent a lot of time as a CTO, and if I had to condense how to make engineers productive into one thing, it would be something like: make the feedback loops fast.

Another interesting thread in the post is data as a new engineering discipline, and how the 'book of best practices' is still currently being written. Related to this is machine learning engineering; both are relatively new and enjoying a resurgence on the back of the popularity of generative AI.

W2D5 - There are no villains in the workplace

We drove up to Point Reyes today, so another sparse post.

I completed Advent of Code Day 6. The instructions clearly refer to a hypothetical Camel Cards game, but I implemented a poker hand strength evaluator which worked for the example which didn't work for the puzzle input. I went down a rabbit hole trying to debug Part 1, and I was not in a good mood when I started working on Part 2 (especially since I had to wait after the trip).

Cindy Wu and I co-organized an RC meetup in SF! It was great seeing people at the start of the New Year. I made a note to listen to Avery Burke and Stevie Hryciw being featured in the podcast Topic Lords.

I did get to listen to Will Larson's interview on Lenny's Podcast. This quote stood out to me.

People create simplistic narratives to find villains that they work with. There are no villains in the workplace. They're just people with complex incentives that are doing complex things.

W2D4 - How do you use ChatGPT?

I didn't take a lot of notes about what I did today and I'm tad behind so it'll be a sparse post.

Advent of Code Day 6 was comparatively more straightforward than other days. I spent a bit of time getting ChatGPT to adding docstrings to my code so far, but had to emphasize that changes are to be made to add comments only and not to the code (I'm rather partial to having newlines).

I watched Geoffrey Litt on Dan Shipper's How do you use ChatGPT podcast series, where they built an app from scratch during the podcast. My front end isn't great so I'm looking forward to replicating what they did. I looked through Dan Shipper's Twitter feed and got to share my love for Thai food.

W2D3 - The sword of strength pales before the shield of wisdom

Author's note: If you're wondering where the quote came from, I got ChatGPT to generate quotes that describe when brute force no longer works. The other one I quite like is 'Brute force is the last resort of the incompetent', allegedly inspired by Isaac Asimov.

In the previous post, I mentioned how I looked up all the talks by Bryan Cantrill but omitted how his talk on Docker / containers was the one that inspired it. Other notable talks include the golden age of hardware / software co-design and Monktoberfest 2022 (which I highly recommend if you’re a parent).

The other thing that I didn’t mention was attending the ML Applied Projects event. I wanted to get started on 2024 ideas, among them having a local copy of my ChatGPT discussions. During the event I was told these are available to download! I don’t recall seeing this option last year, but I guess that task is done.

The notable event today was the Zig meetup at Bun HQ. It's often refreshing to chat to people in person and comforting to realize if you feel awkward at events like these, others likely feel the same way too. Jarred Sumner went through cool string optimization stuff that Bun does (for example storing details in the 12 bits not needed in an 8-byte address because the system only needs 52 bits), which is easy in Zig because Zig doesn’t have a string type.

Advent of Code Day 5 was the first day my code that works for the example wouldn’t work for the puzzle input. For the first part, the solution involved mapping integers to integers and my lazy implementation generated all the possibilities. The puzzle input made the dictionaries huge (though in my case strangely didn’t OOM), so I converted the dictionaries to be more compact by storing only the values needed to make the translation.

For the second part, we needed to run different ‘seed’ values through multiple integer-to-integer mappings to find the minimum final output. The brute force implementation would have taken a long time. The fix involved two changes (1) finding the minimum contiguous range across all lookups to know how many you can skip (for example, if you know 50 maps to 10 and 55 maps to 15 then you just need to keep track of 10), and (2) fill the gaps in the dictionary so you can do the first change more effectively. The times I measured:

w/ step 1 : not done even after ~4 hours

w/ step 1 : done after ~2 hours

w/ steps 1 + 2: done after ~4 mins

I’m adding to my reading list the NY Times lawsuit against OpenAI. Simon Willison in his AI in 2023 post described it as “some of the clearest explanations of what LLMs are, how they work and how they are built that I’ve read anywhere.”

All the things:

W2D2 - Why static typing came back

As I'm writing this, I'm letting my brute force implementation of Advent of Code Day 5 run free. My estimate was 3 hours, I'm time-boxing writing this post to 30 minutes and let's see where we are when I finish writing this post (edit: 30 minutes it looks like I'll need 3 more hours, so new estimate is 3.5 hours).

Advent of Code Day 4 was definitely more straightforward than Day 24. I got to re-use the scanner I built for Day 24, which is always satisfying.

I watched Why static typing came back on the back of Bob Nystrom's tweet, where he wrote loved it (in emphasis). I really enjoyed the talk, so much so that I'm keen to find other talks by Richard Feldman (last time this happened was probably Bryan Cantrill) and even opened up Roc's website.

In a nutshell, the talk described how (1) dynamic languages exploded when the Internet became a thing and being slow wasn't terrible because the modem was even slower, (2) dynamic languages introduced a number of features that made it more fun to code in plus you get faster iteration cycles, but (3) these features were introduced by dynamic languages but nothing stopping static languages adopting it, so with fast compilers and LSP / IDEs static languages can get the best of both worlds (but the reverse is not necessarily true).

Later in the afternoon I realized I'll be going to a Zig meetup the next day but haven't played around that much with Zig. I watched Andrew Kelley's Intro to Zig but had a hard time concentrating. I noted how I really like one of the goals of Zig (namely "Raise the standards of software as a craft throughout the industry") and the quote below (which made me think about Ben Kuhn's In defense of blub studies), but skipped to other videos around the 15 minute mark.

Think about colleagues that you have and you can probably rank them in your mind about which ones are more capable and which ones aren’t. There’s a pattern that I noticed and sometimes you think that it’s the person with more years of experience that is more capable but I’ve found that it’s actually just the person who’s willing to dive deep the most and learn the most about the system that they’re working with who actually ends up being the more capable person.

More 'popular' videos I watched instead were Zig in 100 seconds and Initial thoughts on Zig. My takeaways were custom allocators and comptime are unique to Zig but I'll need more homework to explain this well.

The Zig's new relationship with LLVM post I found really interesting even if the later parts went over my head. I've been curious about Mojo and the post illuminated considerations and trade-offs around language toolchains. What's really cool is making LLVM optional allows Zig to be developed fully in Zig, faster compilation and allows for in-place binary patching.

W2D1 - Writing a linear solver

I wanted to work on Advent of Code Day 4, so I clicked on the link above Day 3 thinking it's the next problem. I read through the problem to learn that I needed to code up a linear solver. Yes I did study math in college but that's a while back.

Inverting a 2x2 matrix wasn't hard, but I tried replicating the example given and always ended up with the wrong answer. It turns out that I calculated the determinant of the matrix, but did a multiply instead of a division.

I completed Part 1 and realized the problems weren't sorted in order - above Day 3 was actually Day 24!

Oh well.

For this problem I invested a bit more time writing a scanner. This way parsing the text needed to be done only once instead of multiple times when using `.split()` (even though from a performance perspective it probably doesn't move the needle).

What's quite nice is using Github Actions as a test runner. I previously did this when experimenting with ChatGPT to improve my code. The small change I did this time around was to use `ruff` instead of `black`, but no better reason besides that's what I kept seeing the cool kids use.

W1D5 - Advent of Code 2023

I wanted to switch gears into something more fun and realized I could get started on Advent of Code.

The Day 1 problems didn't feel as lightweight as what I would imagine Day 1 problems to be. I was tempted to have a scanner that worked both ways (since you want the first and last number in a string, number defined as '1' or 'one'), but realized after taking a break that it's more of a premature optimization.

The plan was to code in Python and then port the slow parts to Mojo. Jen Hsin shared a very well-written Advent of Mojo series of posts, which on a quick review shared how "68,000x speed up is a bit ambitious here since you're not parallelizing lots and lots of matrix multiplies". What's particularly helpful is one of the posts in the series talks about usability, which would be a good post to start as a newbie.

I realized there was an open question on Mojo from the day before, so got around to take a closer look.

Prompt: I was listening to a talk on Mojo and came across the notion that GPUs are optimized for matrix multiplies but not necessarily for operations relating to the transformer architecture. Is this really the case?

Response: The statement you heard about GPUs being optimized for matrix multiplications but not necessarily for operations specific to the transformer architecture is partially true, and it relates to how GPUs are designed and how transformer models work.
GPU Optimization for Matrix Multiplications: GPUs are highly efficient at performing matrix multiplications, which are a core operation in many deep learning tasks. This efficiency is due to their parallel processing capabilities and architecture, which is well-suited for operations that can be broken down into smaller, simultaneous tasks – like matrix multiplications.

Transformers and Matrix Multiplications: Transformer models, popularized by architectures like GPT (Generative Pretrained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), rely heavily on matrix multiplications. These are part of the attention mechanisms and feed-forward networks within the transformer layers. GPUs are well-suited for these aspects of transformers.

Other Operations in Transformers: However, transformers also involve other types of operations that are not purely matrix multiplications. For example, operations in the attention mechanism like the calculation of attention scores (involving softmax functions) or certain element-wise operations might not be as optimally performed on GPUs as matrix multiplications. While GPUs can certainly handle these tasks, they are not as inherently optimized for them as they are for matrix multiplications.

Emerging Hardware for Transformers: Recognizing the growing importance of transformer models, there's ongoing research and development in creating hardware that's more specifically optimized for the needs of these models. This includes both modifications to GPU architectures and the development of specialized processors (like TPUs - Tensor Processing Units by Google) that can handle the mix of operations in transformers more efficiently.
In summary, while GPUs are indeed optimized for matrix multiplications and perform well for many operations in transformer architectures, they are not necessarily optimized for every specific operation within those architectures. This has led to an interest in developing hardware that can better cater to the unique demands of transformer models.