W5D4 - Those who cannot remember the (recent) past are (also) condemned to repeat it

My notes on what I'm doing daily have become more sparse, note to self to be more verbose.

What is in my notes is reading Vicki Boykis' post "What we don't talk about when we talk about building AI apps". In particular, even though we're moving into this new paradigm of LLMs, we still have to contend with bloated Docker containers. The post describes images with deep learning libraries as large as 10 GB!

This actually reminded me of work I did at Airtable to reduce Docker image size, by

  1. Reordering the Docker file to have instructions that update less frequently higher in the file,
  2. Removing unused Python modules, and
  3. Setting the AWS Codebuild to git deep vs shallow copy

I know, it's nothing fancy. The last one was particularly counter-intuitive, but the AWS rep said they're using a Go git client and apparently that made a difference (it's also unclear the last hack still works). That said, all-in-all a 60% image reduction size.

Modal discusses building a Docker-compatible custom container runner, image builder and filesystem. Nice.

W5D3 - Overachieving on personal goals

It’s the last day of January, I thought it’s worth reviewing the goals I set out for the year - I’m done on 3 out of the 5.

2. Text transcription

I started with OpenAI’s Whisper model to transcribe audio recordings, running it locally took 8 hours for 1 hour of audio. I looked into using Modal to be able to run it in the cloud on GPUs. I went to a LangChain meetup and discovered the quantized version of Whisper, which cuts down the local running time from 8 hours to 2 hours. We could do better, sure, but this solves my use case.

4. Twitter bookmarks

I tried out using Twitter’s API but got stuck at auth issues, and I didn’t have confidence the API would work as intended given the issues the company is having. I came across Dewey but didn’t want to pay $10 / month. It turns out it’s easy enough to inspect requests from Chrome and paginate your way to glory.

5. ChatGPT conversations

At the Applied ML RC event I was told now you can just download your conversation. OK I guess we’re done.

W5D2 - The problem with LLMs? Text blobs

I've always been interested in paradigm shifts. Some reflect major shifts, some others are smaller scale.

I came across the instructor library while looking around at tooling around LLMs. The premise is simple - if we now send prompts to language models and get a text blob back, how do we figure out if the text blob has the right structure? For example, how do we make sure the JSON parses correctly or has the right keys?

With instructor, pydantic is all you need. You get data model validation, and outputs JSONSchema which is what OpenAI function calling uses. It's a nice intersection of LLMs and type theory.

W5D1 - Is ChatGPT making society more unequal?

Every now and again I re-read Tyler Cowen's interview in the New Yorker titled "Are computers making society more unequal?", in particular this quote.

You’ll need to be really good at judging, and being dispassionate, and you’ll have to have a sense of what computers can and cannot do. It’s about working with the machine: knowing when to hold back, when to intervene.

I had the chance to listen to him on the podcast How Do You Use ChatGPT. In particular, he mentioned how ChatGPT will be egalitarian at the start, say by writing good college essays for everyone. Over time, however, leverage will flow to those best able to start projects and allocate resources. Hence the title of this post.

W4D5 - Faster inference with quantized models

Today I attended LangChain's 'hacking hours' event, which presumably was organized to get people ramped up on LangGraph. I had a few interesting chats though amusingly not about LangChain. I came across Sasha Rush's GPU Puzzles, which is a set of notebooks to teach GPU programming through a set of puzzles. This led me to Tensor Puzzles, which teaches you how to think in tensors. 

I had been using Whisper to transcribe audio files. At the event I was introduced to whisper.cpp, a quantized version of the model obtained by using lower precision floats and integers. This reduces model size and speeds up inference, and in particular has improved performance on CPUs. With Whisper it took 8 hours to transcribe 1 hour of audio, with whisper.cpp I needed just 2 hours!

It's a bit harder to motivate building a web app now that my use case can be done locally...

W4D4 - Learning to be the responsible adult

My first job out of college was in finance. My role involved coming up with complicated financial structures that minimized interest rate and currency risk our clients are exposed to when they raise debt financing. I enjoyed the technical aspects, but in a way I feared what my job would entail if I were to rise in seniority.

The senior members of the team brought in business, junior members execute the work. In that way finance is similar to other professional services like law or consulting. That said I wasn’t sure I wanted my performance evaluated on how much my client liked me, whether that be at the office or the golf course. What if I don’t like golf?

Over the years I’ve come accounts from others. In chronological order, Nicholas Chirls on his time at Lehman.

Investment banking was a default career of sorts in 2007, something for the over-achieving ivy-league kids who didn’t quite know what they hell they wanted to do, and there were plenty of jobs being given out back then… I vividly remember walking into the career services office my junior year in college and being prompted with, “Well, which bank would you like to work for?”

Alex Rampell on ‘winning’ in life.

If you're in a competitive thing, “I want to win” in high school, you want to go to the best college. Part of winning in college is you want to get the best job… If you're 22 and working at McKinsey or Goldman Sachs and the pay isn't even that good relative to other things that you can potentially be doing with your time if you know how to optimize it, it has the imprimatur of “Wow, you are a very successful person!” and that's the hardest job to get after graduating from a school like Harvard.

This week, David Perell on integrity.

One place we try to get integrity without people being themselves is the classic thing where… they become lawyers and doctors and they go into a very small list of fields. What I think lacks integrity about that, I mean there’s a nobility to it of “hey you’re going to do your family well”, but what lacks integrity about it is misses human individuality which I really value and think is important.

It's something I've been thinking about a lot after spending time in Asia over the holiday period.

I digress. Today I led the Engineering Leadership roundtable at Recurse Center. It’s a shift in recognizing that both technical and people skills are needed to succeed. It’s a shift from my younger self, whether it be the financier working on sophisticated cross-currency swaps or the software engineer wanting my manager to provide air cover so I can focus on coding. It’s a shift to becoming the responsible adult.

W4D3 - The real 30 under 30 imo you have never heard of

I've been sharing a couple of 'hot take' blog posts, and I've caught myself clicking refresh to see how many views they've had. Not terrible, but I do that instead of spending heads down time learning new things. I've been spending time building the façade. I'm reminded of what Andrej Karpathy had to say.

In AI at least, the real 30 under 30 imo you have never heard of. They are 5 layers down the org chart from the CEO. They are usually not on Twitter, they have an unmaintained LinkedIn, they don’t go on podcasts, and they maybe published at one point but don’t do so anymore. They are today’s Einsteins and directly invent and build miracles. I wish they weren’t hidden inside big tech and startups in return for very large paychecks but here we are.

Now let's go build, this time for real.

W4D2 - Machine Learning is Not *That* Hard for Software Engineers

I try to keep a positive vibe through my posts, but that changes today. Today I am going to rant.

I read Machine Learning is Still Too Hard for Software Engineers by George Mathew, having discovered it at the top of Hacker News. 

Now, where do I start?

The author described how, to do something basic like image classification, you have to:

Understand concepts like tensors, loss functions, transfer-learning, logistic regression, network fine-tuning, hyper-parameter search, over-fitting, active learning, regularization, and quantization.

This is like telling a software engineer, to set up a database, you have to (special thanks to ChatGPT):

Understand concepts like indexing, transaction logs, replication, query optimization, database tuning, parameter indexing, redundancy, real-time processing, data normalization, and compression.

No it’s literally a pip install and then this (from sqlite docs):

  import sqlite3
  con = sqlite3.connect("tutorial.db")
  cur = con.cursor()
  cur.execute("CREATE TABLE movie(title, year, score)")

Next the author talks about the need for Exploratory Data Analysis (EDA), which involves a closer look at your data to figure out if work is needed before say feeding it to a model (like maybe you need to scale first so there’s zero mean and standard deviation of one).

Not sure where dude’s been but there’s a million blog posts on using ChatGPT to do EDA (even OpenAI called it Advanced Data Analysis at one point!). It’s also highly curious that data quality issues come under the header ‘Software’.

Finally, infra. Apparently “once you’ve figured out the ML and software bits, you’ll need cloud infrastructure expertise”.

I mean, the docs for AWS SageMaker and AWS Lambda read equally obtuse to me.

There are startups that are trying to make this better. Modal so you don’t have to read AWS docs. Modular so you can write Python vs CUDA.

Yes, they’re new but come on, man, we’re all trying to make this better. Don’t go off talking about how hard ML is and then go “Want to build your own classifier in just minutes?” (which is behind a login page to boot).

It’s lipstick on a pig. Don’t do this.

I had the privilege of working with Thomson Nguyen at Square, he had the following analogy. A car is a very complicated machine, but to drive it you only need to know a couple of things: steering wheel, gears, pedals. Same thing for ML.

That doesn't mean you stop there. Get started, keep going, and don't let perfect get in the way of being good.

Now let’s go build.

W4D1 - Artfully disguise your way, treading the path until you reach the realm of accomplishment

Author’s note: As I write this, I noticed how the previous post had the most views by far in this series. Thank you for your support! Feel free to get in touch here if there’s particular areas you’re curious about, I would love to hear from you!

I had been exploring different generative AI topics and now finding myself wanting to be a bit more focused. I need to have a goal, but what could it be? The last time I got this excited about a technology was probably 10 years ago, when I made the move from Berlin to San Francisco to take a closer look at data science and machine learning.

Now what goal did I have then? I started with Stanford’s Statistical Learning online course (with R!), learned Python via Codecademy and Peter Norvig’s Design of Computer Programs online course, and started attending meetups. In particular, remember attending a Python meetup where one of the contributors to scikit learn shared new features from the most recent version release.

That might have been my first Python meetup, but I wondered to myself, what are the existing features of scikit learn? If it’s a question I’m asking for myself, surely there are others who are asking the same question too.

I put together a presentation that helps you go from 0-to-1 with pandas and scikit-learn. I set up the Kaggle Berlin meetup and presented it there twice. I applied to present at PyCon UK 2014 and got accepted! It was a blast! Plus saying “I’m here to present at a technology conference” was the fastest immigration experience I had entering the UK.

What do I have in mind this time around? It’ll be some variation of getting the audience 0-to-1 with generative AI, more details to come.

I love reading about paradigm shifts, and learning to program was my entry to the previous one. The theme in the mid-2010s is there’s so much data, the ones who are skilled at extracting insight from data will be better to ride the waves of change. IMHO the theme now is we have this new ‘alien technology' that we’re still figuring out how to use it.

That technology is generative AI.

If you’re curious about the title, I translated ‘fake it till you make it’ to German, then back to English under different translation styles and picked my favorite one. I know, right? Amazing!

W3D5 - Ich kann denken. Ich kann warten. Ich kann fasten

Say what you will about ChatGPT and the humanities, but it’s been about 10 years since I first picked up Hermann Hesse in Berlin and I’ve never looked at the German text… until now.

I've been watching David Perell on the How Do You Use ChatGPT podcast, where he compares different translations of the Bible and discusses how the source language affects the English translation. I tried this out with one of my favorite quotes from Hesse’s Siddharta, this version from Stanley Applebaum.

“What is it that you've learned, what you're able to do?" "I can think. I can wait. I can fast." "That's everything?" "I believe, that's everything!" "And what's the use of that? For example, the fasting -- what is it good for?" "It is very good, sir. When a person has nothing to eat, fasting is the smartest thing he could do. When, for example, Siddhartha hadn't learned to fast, he would have to accept any kind of service before this day is up, whether it may be with you or wherever, because hunger would force him to do so. But like this, Siddhartha can wait calmly, he knows no impatience, he knows no emergency, for a long time he can allow hunger to besiege him and can laugh about it. This, sir, is what fasting is good for.”

Now in the original German as per Project Gutenberg.

"Was ist es, das du gelernt hast, das du kannst?" "Ich kann denken. Ich kann warten. Ich kann fasten." "Das ist alles?" "Ich glaube, es ist alles!" "Und wozu nützt es? Zum Beispiel das Fasten—wozu ist es gut?" "Es ist sehr gut, Herr. Wenn ein Mensch nichts zu essen hat, so ist Fasten das Allerklügste, was er tun kann. Wenn, zum Beispiel, Siddhartha nicht fasten gelernt hätte, so müßte er heute noch irgendeinen Dienst annehmen, sei es bei dir oder wo immer, denn der Hunger würde ihn dazu zwingen. So aber kann Siddhartha ruhig warten, er kennt keine Ungeduld, er kennt keine Notlage, lange kann er sich vom Hunger belagern lassen und kann dazu lachen. Dazu, Herr, ist Fasten gut."

ChatGPT came up with different translators altogether, here describing each translator’s style.

Hilda Rosner: Rosner's translation would likely focus on clarity and directness, maintaining the simplicity of the original German while making it accessible to English readers. Her translation would read smoothly, capturing the practical wisdom in the passage without overly embellishing the language. 

Joachim Neugroschel: Neugroschel's translation might delve deeper into the cultural and philosophical context, possibly adding footnotes or explanations to convey the full depth of the original text. His version would be rich and detailed, ensuring that the cultural and philosophical nuances are not lost in translation. 

Susan Bernofsky: Bernofsky's translation would focus on capturing the lyrical and meditative quality of Hesse's writing. She would aim to reflect the original tone and mood, ensuring the translation feels as poetic and introspective as the German text. Her version would resonate with the philosophical and spiritual elements of the passage.

While ChatGPT wasn’t able to find the actual translations, I got ChatGPT to translate from German to English in the style of each translator.

Hilda Rosner: "It is very good, sir. When a person has nothing to eat, fasting is the wisest thing he can do. For instance, if Siddhartha had not learned to fast, he would have had to take up some service today, whether with you or elsewhere, because hunger would have compelled him. But now, Siddhartha can calmly wait; he knows no impatience, no distress, he can let hunger besiege him for a long time and still smile about it. This, sir, is the value of fasting." 

Joachim Neugroschel: "It is exceedingly beneficial, sir. If a man has naught to eat, fasting becomes the most prudent action he could undertake. Take Siddhartha, for example; had he not mastered the art of fasting, he would be compelled to accept any form of servitude this very day, be it with you or elsewhere, driven by the coercion of hunger. However, Siddhartha can await events with serenity; he is a stranger to impatience, to dire straits, able to endure the siege of hunger for extended periods, and can regard it with a hint of amusement. Such, sir, is the essence and purpose of fasting." 

Susan Bernofsky: "It is profoundly good, sir. When one finds oneself without sustenance, fasting emerges as the most sagacious course one might embrace. Consider Siddhartha: had he not been adept in the discipline of fasting, he would find himself compelled to enter into servitude this very day, whether in your employ or elsewhere, as hunger's inexorable demand would dictate. Yet, in his way, Siddhartha can await with tranquility, untouched by the throes of impatience or desperation, able to let the pangs of hunger encircle him for lengthy spans, and meet it all with a trace of laughter. In this lies the true virtue of fasting, sir."

It looks plausible (I quite like Neugroschel's version) but it doesn’t hurt to be a little skeptical. I asked my friend Spencer Strub who teaches writing, and this is what he said:

This shows ChatGPT's strengths -- an impressively thorough digest of the approaches of each translator, and in nice prose, too! -- but that last paragraph also shows why students mess themselves up when they just turn in ChatGPT output: it's not going to tell us which would be best and why. That's a subjective judgment, and ChatGPT rightly doesn't offer it. It's a cool exercise, though...

Thanks for getting this far! In a nutshell, even if there’s room for improvement this is the furthest I’ve gotten into a closed reading of Siddharta despite reading it almost every year. All thanks to a word calculator.

I’ve been trying different things since the start of the year and this following week I’m experimenting with the following focus areas:

  1. Generative AI: Full Stack LLM bootcamp, build apps and deploy to the cloud (likely Vercel + Modal)
  2. People skills: Staff+ Engineer roundtable
  3. Communication skills: Ultraspeaking

On generative AI, I did a speed run of Andrej Karpathy’s Zero to Hero series at my most recent RC batch. I contemplated a re-run of that with exercises, or going through fast.ai courses. I met one of the instructors at Full Stack LLM at the Modal Labs meetup yesterday, took a closer look at the materials and thought this would be a good way to learn generative AI from a product-centric perspective.

On people skills, it’s a recent focus and now switching gears to a group setting. I’ve set up the Staff+ Engineer roundtable at Recurse Center for next week, and excited to discuss a few threads I’ve been pulling: professional coaching, networking, self discovery. In fact, I tried out the exercise from podcast above by pasting in the results of a personality test into ChatGPT and then asking who I work well vs less-well with. I was blown away by the less-well personas, perhaps something I can share at the roundtable if there’s interest.

On communication skills, I start the Ultraspeaking course next week! The Youtube channel has illustrative videos of what we’ll get into. I know myself well enough to know I need a class setting for consistency. Happy to share my findings as the course progresses.

OK so this post has been more about what I plan to do rather than what I did today. I shared David Perell’s podcast with more people, describing it as follows.

In this episode the writer David Perell talks about how deliberately he is at learning about himself and understand what he could be world-class at; in particular where ChatGPT fits in that process.

I replied to a few tweets: here, here and here. I signed up for more events: this one on applied LLMs and this one on sharing outage stories the following week. 

The following quotes echo advice on building expertise in Python that I previously shared on this blog; this came up in an RC discussion. Here is David Beazley.

As an aside, sometimes people ask me "what can I do to improve my Python skills?" Much to their surprise, I often suggest doing a project in a completely different language or outside of their area of expertise. I think the main benefit of doing this is that you'll often see a completely different way of thinking about a problem that you can bring home to your own projects.

Next Peter Norvig.

Learn at least a half dozen programming languages. Include one language that emphasizes class abstractions (like Java or C++), one that emphasizes functional abstraction (like Lisp or ML or Haskell), one that supports syntactic abstraction (like Lisp), one that supports declarative specifications (like Prolog or C++ templates), and one that emphasizes parallelism (like Clojure or Go).

Finally Donald Knuth.

If you find that you’re spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you’re spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice.

Never gets old.


All the resources I shared