Time flies when you’re having fun. It’s hard to believe it’s been 6 weeks already. I extended my stay at RC to a full batch but I'll start layering on intro calls and interview prep. It’s unclear how long the job search will be, this is a reminder to pace myself.
The final video in the `makemore` series involves making the previous model deeper with a tree-like structure. The end result is the convolutional neural network architecture inspired by WaveNet. What’s particularly interesting is the convolutional aspect, where a technique more commonly used for images is used to take advantage of the ‘for’ loop taking place inside the CUDA kernel. Emphasis is also placed on building out an experimental harness to track how training and validation loss changes over time.
I switched my focus from ML to data infra about 3 years ago. The natural question to ask as I started looking into ML again is, what did I miss?
The next video I watched in Andrej Karpathy’s Neural Networks: Zero to Hero series is where he starts building `makemore`, a model that takes in words and ‘makes more’ words like it. In the video, the model uses human names as training data and generates words that sound like human names.
I think it’s interesting that if you ask people 10 years ago about how AI was going to have an impact, with a lot of confidence from most people, you would’ve heard, first, it’s going to come for the blue collar jobs working in the factories, truck drivers, whatever. Then it will come for the low skill white collar jobs. Then the very high skill, really high IQ white collar jobs, like a programmer or whatever. And then very last of all and maybe never, it’s going to take the creative jobs. And it’s going exactly the other direction.
The first draft of this post was handwritten on paper. This was by design.
I decided to switch gears to learning how ChatGPT works under the hood, post-conversation with a friend last week. I also took the opportunity to update my LinkedIn profile. This was something I was initially hesitant about doing; partly to concentrate on functional programming, partly to spend a bit of time more deliberately thinking what I'd like to do next.
At the start of RC, I wasn't sure whether to learn functional programming or spend time on Andrej Karpathy's videos. Why Andrej Karpathy? I really enjoyed the material he put together for the Stanford CS231n course on computer vision, which then inspired me to come up with a 1-hour intro on neural networks for SquareU.
I spent a bit of time going through the video on a minimal backprop; it's deceptively simple. On a related note, the intro above had a graphic on how the weights change when it goes through stochastic gradient descent but only for a 1-layer network.
The rest of the day I used ChatGPT as a consumer. The first use case was creating Anki flashcards; I did a lot of courses at Bradfield but I've also forgotten a lot of what I learned. GPT-4 is very good at creating flashcards. I would use the text from OSTEP into the prompts for the final version, but the question-and-answers sets it came out with when I tried prompting without any OSTEP text (hence completely made up) were impressive!
What I wasn't able to get GPT-4 to do was implement Raft in Python. I copied the the tests from my implementation into the prompt, and tried to get GPT-4 to write code that would pass the tests. If errors were raised, I would paste the error in full as a prompt to get the next iteration. This failed on the same test 10 times in a row. Next time I'll try helping it along after repeated failures, to see how far we end up.
The amusing part was how, after a few failed iterations, ChatGPT tried to change the function's parameters from 4 to 5 and asked me to change the tests accordingly! I shared this with a friend, who then responded as follows.
A candidate tried that with me during an interview. Human level AI.
I met up with a friend over coffee last week. He’s a founding engineer at a hybrid workplace platform. I was keen to hear about the job market. All he wanted to talk about was GPT.
OK. The path is clear.
Events at RC usually start a few minutes past the hour, and it’s always more fun to spend the time in between with ice breaker questions. This week we had “What’s your favorite programming language mascot?”. I was going to go for Rust’s Ferris, but given this list, my vote has to go to the wonderfully-apt Docker whale (which I learned is called Moby Dock).
Differential dataflow is a data-parallel programming framework designed to efficiently process large volumes of data and to quickly respond to arbitrary changes in input collections.
My first time at RC, I would go down rabbit holes as long as its fun. This time at RC, I wanted to be more deliberate. In any case, extremes are bad. At least this is how I justified going from working on a series of Prolog exercises to thinking more broadly how things fit together.
I recall committing to specific goals on Pi Day in the past, but perhaps I’ve forgotten what they are because I haven’t shared them publicly. This Pi Day I commit to thinking more deliberately about my long-term career trajectory and to make choices in a way that the gains compound.
Fight this urge whenever possible. Know your tools. Accept the abstractions, but only once you’ve studied their implementation and understand their limitations. You’ll never have enough time to do this for every tool that you use, but if you do it for even a small fraction of them you’ll reap massive benefits.
At some point in your career you’ll have to start taking responsibility for your career trajectory. The key insight is that you should make decisions about what projects to work on and which teams and companies to join strategically, not tactically. Think long term.
Nothing will accelerate your growth faster than spending all day working with other very good engineers. The moment you start to feel like you’re not learning from the engineers around you is the moment you should start looking for a new team.
On occasion I start working on something but then forget the reason I got started in the first place. This time at RC, I find it helpful to go back to SICP to answer the questions like “why learn Prolog?”.
Baker, Cooper, Fletcher, Miller, and Smith live on different floors of an apartment house that contains only five floors. Baker does not live on the top floor. Cooper does not live on the bottom floor. Fletcher does not live on either the top or the bottom floor. Miller lives on a higher floor than does Cooper, Smith does not live on a floor adjacent to Fletcher’s. Fletcher does not live on a floor adjacent to Cooper’s. Where does everyone live?
select([A|As],S) :- select(A,S,S1), select(As,S1).
select([],_).
dinesmans(X) :-
%% Baker, Cooper, Fletcher, Miller, and Smith on different floors
%% of an apartment house with five floors.
select([Baker,Cooper,Fletcher,Miller,Smith],[1,2,3,4,5]),
%% Baker does not live on the top floor.
Baker =\= 5,
%% Cooper does not live on the bottom floor.
Cooper =\= 1,
%% Fletcher does not live on either the top or the bottom floor.
Fletcher =\= 1, Fletcher =\= 5,
%% Miller lives on a higher floor than does Cooper.
Miller > Cooper,
%% Smith does not live on a floor adjacent to Fletcher's.
1 =\= abs(Smith - Fletcher),
%% Fletcher does not live on a floor adjacent to Cooper's.
1 =\= abs(Fletcher - Cooper),
%% Where does everyone live?
X = ['Baker'(Baker), 'Cooper'(Cooper), 'Fletcher'(Fletcher),
'Miller'(Miller), 'Smith'(Smith)].
main :- bagof( X, dinesmans(X), L )
-> maplist( writeln, L), nl, write('No more solutions.')
; write('No solutions.’).
Verse and Mercury are functional logic languages; Mercury even calls itself 'Prolog meets Haskell’. Details will follow in later posts.