Time flies when you’re having fun. It’s hard to believe it’s been 6 weeks already. I extended my stay at RC to a full batch but I'll start layering on intro calls and interview prep. It’s unclear how long the job search will be, this is a reminder to pace myself.
I presented on Prolog at Friday presentations. I started with introducing declarative programming as a 3rd paradigm alongside imperative and functional programming, talked about a math puzzle that Prolog is particularly well-suited for, highlighted the relationship with relational algebra and Datalog, then went into a demo. I emphasized, as the main key take away, how Datalog as a query language is much cleaner for doing recursive calls. Over the weekend I came across Eric Zhang’s senior thesis, which will be a treasure trove... when I get to it.
I finished watching the last video in Andrej Karpathy series, where he builds GPT from scratch. It’s still not clear to me what makes a neural network a transformer. My take away is the use of attention, where instead of giving equal weights to each token in a sequence, that weighting itself is ’trained’. For example, the nouns in a sentence may have a clearer meaning when coupled with nearby adjectives rather than other nouns, so that link is given a higher weight.
It’s actually a great single video to ‘catch up’ on recent developments in training and optimizing neural network models. With regards to attention, he discusses (1) having multiple heads of attention since tokens have a lot to ’talk’ about, as well as (2) alternating between communicating and computation (talking and thinking). With regards to deep networks, the innovations here are (1) having residual connections that act as a temporary ‘highway’ until the deeper layers come online, and (2) the use of layer norm, which is similar to batch norm but normalizes rows instead of columns.
On Friday we also reflected on the batch. How did the batch turn out differently vs what I wrote in my application (snippets here)?
I ended up doing both functional programming and generative modeling. The functional programming discussions were super fun explorations; I discovered a world much bigger than I thought was possible, especially with effect systems. There’s a lot of underlying theory to functional programming - lambda calculus, type theory, category theory. In the words of Richard Eisenberg, “I’ll never be bored again”.
What did I do in batch?
I led the functional programming study group. I learned Haskell, Prolog and Idris. I read up a lot of posts on motivations for each language and a little bit of theory. I discovered that out of all the CS topics, I really enjoy learning about programming languages (it’s also the only project where I’m at the 6th iteration). The next time Python or Rust or TypeScript gets a new feature, I’ll be motivated to dig up papers for more context on its intellectual provenance.
I started with ML when I picked up programming and I feel I’ve come full circle now with generative models.
I wrote a blog post every day in batch.
What were the highs and lows?
The high point was discovering functional programming as a deep well of ideas that I can always draw on.
The low point was wondering if my excursion into functional programming is escapism.
Maybe it’s worth reflecting a little more on the low point. I felt sad thinking that I’ve been jumping around different interests and roles, which makes it difficult for gains to compound. Now I’m a little more reassured that experimenting with different things helps provide a broader foundation to build on. The advice from David Beazley and Peter Norvig to those who want to become Python experts is to learn Python and then everything other than Python. This is the same advice but extended to life.
Never graduate.