I remember focusing on fintech startups in the previous job search. Personally I felt it's important that the business problems the company is solving are the ones that I care about. This interview cycle, I realized that I also care about data tools having had first-hand experience of the pain points (though perhaps I employed artistic license in the retelling).
The dominant theme this interview cycle, however, is AI.
AI startups getting wiped out
Last year, the thought of joining an AI startup scared me. Imagine joining a startup that wraps the OpenAI API to do pdf extraction. Further down the road OpenAI enables pdf uploads and the startup gets wiped out.
This year, a startup I interviewed with does exactly that!
It turns out there's a spectrum. At one end, you have generic legal documents that have no repeatable visual structure, where the meaning of each word is enriched by other words surrounding it. This is where LLMs shine. At the other end, you have say a Bank of America statement with the logo on the top right and each row being a date - transaction description - amount triplet; existing visual layout-based techniques do a good job here extracting the contents based on a pre-defined schema.
Thus the value-add is not just wrapping OpenAI's API, it's knowing when to use existing techniques vs running it through an LLM. Like all tools there are trade-offs, in this case extraction quality, latency and cost. In fact, LLMs are useful for converting the 'long tail' of bank statements into schemas, introducing a more efficient workflow to run existing techniques!
AI startups interviewing differently
I had take-home exercises as part of the interview process. The first two had mixed results - I moved on to the next stage for one, got passed on the other. I had used ChatGPT sparingly in my submissions, and I wondered, should I have used it more extensively?
Things got more interesting when I received the next two take-homes. The first was for an AI team so the use of AI tools is implied. For the second it was explicitly encouraged! This completely threw me off. Without ChatGPT I had a rough sense where the 'bar' was, with ChatGPT isn't the sky the limit? Surely I can make my submissions arbitrarily sophisticated.
I panicked. I meditated on the issue. I felt better after letting my thoughts settle.
I realized the more complicated your submission is, the larger the surface area of questions you'll get asked on. While you can make your submission go much further than your comfort zone, you're taking on much more risk in being less able to answer questions that might come later!
AI startups having an unfair advantage
The final thought I had was around startups vs incumbents. The refrain is startups are able to beat incumbents by being more nimble, and by taking advantage of new technology (before you say Clay Christensen, I present to you Jill Lepore). Think Uber disrupting taxis since everyone now has location-enabled smartphones.
This time the new paradigm is AI. Not only startups starting today can more easily build their product on top of AI, they can additionally curate a team with ChatGPT skills to help each other level up and streamline internal business processes.
Surely startups starting today have a huge edge, being at the crest of this new wave?
AI is a complicated tool. It's powerful, sure, but lots of startups (some supported by eye-watering amounts of fundraising) are still trying to figure it out. It goes without saying that recent startups have an advantage, but it's tempting to take this too far.
Let's say a startup has been around for a few years, has product-market fit and proprietary data. This acts as a moat from new AI startups! It's true there's a correlation between company age and rigidity of business processes, but what's more important is the company having self-awareness in realizing the need to adopt new tools and workflows.
In other words, there's a 'sweet spot' - you want to be early enough that you're able to integrate AI well, but also have maturity in knowing your customers and their problems well. This way you can focus on the challenge of learning new tools and workflows, instead of having to do that as well as learning about your customers.
]]>I had been behind on writing blog posts, and I took notes to write out in full when I find the time. I now declare ‘writing bankruptcy’, but I’d still like to share my notes.
A theme I’ve been coming across recently is “software is eating the world, but now AI is eating software”. It reminded me of this quote.
Technological advance is an inherently iterative process. One does not simply take sand from the beach and produce a Dataprobe. We use crude tools to fashion better tools, and then our better tools to fashion more precise tools, and so on. Each minor refinement is a step in the process, and all of the steps must be taken.
I spoke to a few early-stage startup founders, and I wondered how they thought about building a team around them. Cue Hank Paulson.
There is no perfect leader. Everyone is flawed and their strengths are usually the opposite of a weakness. The key ingredient I found for any of these CEOs that was just essential to success is the team they put around them. I called it "the right people in the right seats". You needed to play to your strength but you need to put people around you who you listen to, who can compensate for your weaknesses. If you didn't these big jobs always uncover your weaknesses.
I had been at law school in the early 2010s, and one summer I met up with a lawyer friend in London. I shared with her my disappointment in not getting an internship in corporate law.
Instead of offering words of comfort, she said something along the lines of “you have a tendency to speak your mind but as lawyers we often need to err on the side of being diplomatic; I’m not sure you missing out on internships is a necessarily a bad thing.”
Now I can tell her, she’s right. Working with software is not without its quirks, but I’ve yet to see someone describe their experience in the vein of the following excerpt.
]]>There’s a saying out here. Every man has three hearts. One in his mouth for the world to know. Another in his chest, just for his friends. And a secret heart buried deep where no one can find it.
I grew up in Kuala Lumpur. I’ve since had the chance to live in London, New York, Paris and Berlin. I’ve been in San Francisco since 2015.
While SF is the smallest city in the list, being here I’ve ironically discovered the wonder of being away from the city. I spent time in Mendocino this week. It’s amazing to get away from the day-to-day. It’s like moving out of the trenches to see the battlefield.
We transition from school to college to work, often learning skills throughout (at least, ideally). What I’ve also since discovered is the importance of learning to unlearn.
We develop mental models to simplify how things work in reality, except often we retain those models even when they don’t work so well any more. Either the reason we adopted them, or our priorities, or the underlying world itself have changed; yet the models remain. Sometimes they become our sacred cows.
What’s helpful to set the ground in revisiting those mental models is creating more space. How to create space? Different things work for different people. Meditation. Journaling. Family. Friends. Nature.
I’m grateful to have made this latest discovery. Especially at times when the world feels like it’s at an inflection point, it’s reassuring to be surrounded by timelessness.
A question I had during an interview was “what does a perfect day look like to you?”.
I responded talking about working with the perfect team, a variation on a post that I had put up on LinkedIn.
The best team I was a part of made meetings fun. I loved hearing what others are working on and which bits are blockers for me, and sharing what I’m working on to learn which bits are blockers for them. Without prompting from managers, team members would regularly share scripts, documentation and resources to help each other level up and move faster.
After the interview I thought about the question a bit more, and followed up by e-mail with a note of thanks and a quick postscript - coming up with a elegant solution after meditating on the problem for a few days (a la Hammock-Driven Development).
Having slept on it, I lean back towards the team. This is one of my favorite excerpts from Sam Altman’s advice when he turned 30.
Go out of your way to be around smart, interesting, ambitious people. Work for them and hire them (in fact, one of the most satisfying parts of work is forging deep relationships with really good people). Try to spend time with people who are either among the best in the world at what they do or extremely promising but totally unknown. It really is true that you become an average of the people you spend the most time with.
It’s tempting to think about my own path on the back of some design, but this desire to be around smart, interesting, ambitious people has been a recurring theme.
I studied math in college partly because I thought I’d get into a better school vs a more practical subject. I went to law school because my friends who did law seem to ‘know how the system works’, making me realize that knowing the grey areas in life is an effective complement to being good with numbers.
Today I work with software. In terms of unique aspects, there is a level of objectivity (it’s pretty clear when your code runs fast) as well as value capture from network effects. The aspect that I like the most can perhaps be best explained through a story.
Cash App came out of a hack week at Square, yet the buyer-facing Cash App now processes more volume than the seller-facing side of the company. There’s no amount of of top-down effort that gets people excited to build something that eventually overtakes the original idea. That happens through getting smart, interesting, ambitious people in the same room and having them believe the have the agency to make magic happen.
]]>When I interviewing last year, I was keen to move back to fintech. I wanted better context on problems in a specific domain, to match that against new capabilities technology now ‘unlocks’. The example that came to mind was Square helping vendors at farmers market to accept credit card payments, enabled by more accurate ML-based risk models.
Now I realize that domain doesn’t have to be fintech. It’s true, most of the exposure I’ve had has been in payments and lending. That being said, what’s important is being at the interface between the business domain and technology. The same principle, of understanding domain-specific pain points and how to alleviate that with more sophisticated tools, still applies.
Perhaps Justine Musk says this best.
Choose one thing and become a master of it. Choose a second thing and become a master of that. When you become a master of two worlds (say, engineering and business), you can bring them together in a way that will a) introduce hot ideas to each other, so they can have idea sex and make idea babies that no one has seen before and b) create a competitive advantage because you can move between worlds, speak both languages, connect the tribes, mash the elements to spark fresh creative insight until you wake up with the epiphany that changes your life.
I shared her post on the first blog entry of my first time at Recurse Center. It’s a nice way to come full circle.
]]>In a previous role, we had audits to help us to refactor our data pipelines with confidence. While it may make sense to run audits in staging, staging data can very different vs production. This means that the changes get merged based on guardrails that allow staging to run successfully, but end up breaking in production.
To get around this, you run tests in production. Never test in production you say? What’s the use of testing if it doesn’t stop you from breaking things. The following quote is from Erik Bernhardsson.
Lets let the right workflows emerge from what makes teams the most productive, and lets let data workflows stand on their own feet.
The book of best practices for data (perhaps ML too) is still being written. Plus there’s probably a startup idea there somewhere.
Spending time at Recurse Center between roles is a great opportunity to think a bit deeper about what I’d like to do next. This is usually through chat with fellow RCs in batch, and it’s always fun to hear how others are going about their search.
I didn't do RC this time around, so that ‘refinement’ process has to be done on the go with real interviews. An interesting question I was asked was “what’s special about having machine learning in your system?”.
I thought about my time at Square and what came to mind was this idea of moving from a deterministic system to a probabilistic one. Suppose a seller owes Square $100, and Square initiates a $100 ACH debit to recover those funds. It takes 3 business days before you know if the debit succeeds. Now the next day the seller needs to be credited $50. What do you do?
In the optimistic case, you assume that the debit will succeed so you let the $50 go through. In the pessimistic case, you assume that the debit will fail so you hold the $50 for two business days. Which one is the right call?
The flow with ML is to train a model based on historical data. If the seller looks closer to one where the debit would succeed, then let the $50 go through. If the seller looks closer to one where the debit would fail, then hold the $50. Hence we switch from making a binary decision, to having a threshold that determines how we would act.
Of course, there is also the question of whether our historical data is sufficient for model training, and if it is, the return on investment on extra complexity is worth it.
Andrej Karpathy takes this a step further in his post Software 2.0, jokingly describing it as follows.
Gradient descent can write better code that you. I’m sorry.
This was written in 2017, and perhaps even more pertinent now.
]]>For my first data science project, I trained a machine learning model using Lending Club data. The hypothesis is I can use that data to ‘cherry pick’ loans in a way that outperforms the average return. More specifically, all loans within a certain grade pay the same interest rate and if I can stack rank to select the top 5% say, then I beat the average.
What I didn’t quite expect is to later work for a startup that did exactly this.
I digress. The point I’m trying to get to relates to the loan data itself. The model I built mainly used the numerical features like the borrower’s FICO score and income. There was a text blob column which is what the borrower wrote as their intention for the loan proceeds. This text blob became simple features like number of characters, number of words, number of sentences.
With text embeddings, this column can now be a vector and the whole model be trained on both the numerical features and the semantic meaning of the text. It’s one use case of many that language models now unlock.
In a post on job searching, Haseeb Qureshi advises candidates to be excited.
Be excited about the company. It’s trite, but it makes a huge difference. Be excited to interview. Be excited to learn about what your interviewer does and the prospect of getting to work with them.
I'm often fascinated how much you learn about companies through the interview process; plus it's often less motivating to do so outside of a job search. This helps me stay excited.
]]>I've been on the job search a few times now, but I still get the emotional swings. I guess it'll never go away, so the only thing you can do is make the experience less painful.
I came across this quote by Jack Kornfield on Tim Ferriss' podcast.
Let suffering teach you compassion because in Tibet, in some of the Tibetan teachings, they actually pray for suffering. They say, "May I be granted enough suffering so that the great heart of compassion will open in me."
It's a helpful reframing, to think of the experience as another rep at the emotional gym.
]]>My notes on what I'm doing daily have become more sparse, note to self to be more verbose.
What is in my notes is reading Vicki Boykis' post "What we don't talk about when we talk about building AI apps". In particular, even though we're moving into this new paradigm of LLMs, we still have to contend with bloated Docker containers. The post describes images with deep learning libraries as large as 10 GB!
This actually reminded me of work I did at Airtable to reduce Docker image size, by
I know, it's nothing fancy. The last one was particularly counter-intuitive, but the AWS rep said they're using a Go git client and apparently that made a difference (it's also unclear the last hack still works). That said, all-in-all a 60% image reduction size.
Modal discusses building a Docker-compatible custom container runner, image builder and filesystem. Nice.
]]>It’s the last day of January, I thought it’s worth reviewing the goals I set out for the year - I’m done on 3 out of the 5.
2. Text transcription
I started with OpenAI’s Whisper model to transcribe audio recordings, running it locally took 8 hours for 1 hour of audio. I looked into using Modal to be able to run it in the cloud on GPUs. I went to a LangChain meetup and discovered the quantized version of Whisper, which cuts down the local running time from 8 hours to 2 hours. We could do better, sure, but this solves my use case.
4. Twitter bookmarks
I tried out using Twitter’s API but got stuck at auth issues, and I didn’t have confidence the API would work as intended given the issues the company is having. I came across Dewey but didn’t want to pay $10 / month. It turns out it’s easy enough to inspect requests from Chrome and paginate your way to glory.
5. ChatGPT conversations
At the Applied ML RC event I was told now you can just download your conversation. OK I guess we’re done.
I've always been interested in paradigm shifts. Some reflect major shifts, some others are smaller scale.
I came across the instructor library while looking around at tooling around LLMs. The premise is simple - if we now send prompts to language models and get a text blob back, how do we figure out if the text blob has the right structure? For example, how do we make sure the JSON parses correctly or has the right keys?
With instructor, pydantic is all you need. You get data model validation, and outputs JSONSchema which is what OpenAI function calling uses. It's a nice intersection of LLMs and type theory.
]]>Every now and again I re-read Tyler Cowen's interview in the New Yorker titled "Are computers making society more unequal?", in particular this quote.
You’ll need to be really good at judging, and being dispassionate, and you’ll have to have a sense of what computers can and cannot do. It’s about working with the machine: knowing when to hold back, when to intervene.
I had the chance to listen to him on the podcast How Do You Use ChatGPT. In particular, he mentioned how ChatGPT will be egalitarian at the start, say by writing good college essays for everyone. Over time, however, leverage will flow to those best able to start projects and allocate resources. Hence the title of this post.
]]>Today I attended LangChain's 'hacking hours' event, which presumably was organized to get people ramped up on LangGraph. I had a few interesting chats though amusingly not about LangChain. I came across Sasha Rush's GPU Puzzles, which is a set of notebooks to teach GPU programming through a set of puzzles. This led me to Tensor Puzzles, which teaches you how to think in tensors.
I had been using Whisper to transcribe audio files. At the event I was introduced to whisper.cpp, a quantized version of the model obtained by using lower precision floats and integers. This reduces model size and speeds up inference, and in particular has improved performance on CPUs. With Whisper it took 8 hours to transcribe 1 hour of audio, with whisper.cpp I needed just 2 hours!
It's a bit harder to motivate building a web app now that my use case can be done locally...
]]>My first job out of college was in finance. My role involved coming up with complicated financial structures that minimized interest rate and currency risk our clients are exposed to when they raise debt financing. I enjoyed the technical aspects, but in a way I feared what my job would entail if I were to rise in seniority.
The senior members of the team brought in business, junior members execute the work. In that way finance is similar to other professional services like law or consulting. That said I wasn’t sure I wanted my performance evaluated on how much my client liked me, whether that be at the office or the golf course. What if I don’t like golf?
Over the years I’ve come accounts from others. In chronological order, Nicholas Chirls on his time at Lehman.
Investment banking was a default career of sorts in 2007, something for the over-achieving ivy-league kids who didn’t quite know what they hell they wanted to do, and there were plenty of jobs being given out back then… I vividly remember walking into the career services office my junior year in college and being prompted with, “Well, which bank would you like to work for?”
If you're in a competitive thing, “I want to win” in high school, you want to go to the best college. Part of winning in college is you want to get the best job… If you're 22 and working at McKinsey or Goldman Sachs and the pay isn't even that good relative to other things that you can potentially be doing with your time if you know how to optimize it, it has the imprimatur of “Wow, you are a very successful person!” and that's the hardest job to get after graduating from a school like Harvard.
This week, David Perell on integrity.
One place we try to get integrity without people being themselves is the classic thing where… they become lawyers and doctors and they go into a very small list of fields. What I think lacks integrity about that, I mean there’s a nobility to it of “hey you’re going to do your family well”, but what lacks integrity about it is misses human individuality which I really value and think is important.
I've been sharing a couple of 'hot take' blog posts, and I've caught myself clicking refresh to see how many views they've had. Not terrible, but I do that instead of spending heads down time learning new things. I've been spending time building the façade. I'm reminded of what Andrej Karpathy had to say.
In AI at least, the real 30 under 30 imo you have never heard of. They are 5 layers down the org chart from the CEO. They are usually not on Twitter, they have an unmaintained LinkedIn, they don’t go on podcasts, and they maybe published at one point but don’t do so anymore. They are today’s Einsteins and directly invent and build miracles. I wish they weren’t hidden inside big tech and startups in return for very large paychecks but here we are.
Now let's go build, this time for real.
]]>I try to keep a positive vibe through my posts, but that changes today. Today I am going to rant.
I read Machine Learning is Still Too Hard for Software Engineers by George Mathew, having discovered it at the top of Hacker News.
Now, where do I start?
The author described how, to do something basic like image classification, you have to:
Understand concepts like tensors, loss functions, transfer-learning, logistic regression, network fine-tuning, hyper-parameter search, over-fitting, active learning, regularization, and quantization.
This is like telling a software engineer, to set up a database, you have to (special thanks to ChatGPT):
Understand concepts like indexing, transaction logs, replication, query optimization, database tuning, parameter indexing, redundancy, real-time processing, data normalization, and compression.
No it’s literally a pip install and then this (from sqlite docs):
import sqlite3
con = sqlite3.connect("tutorial.db")
cur = con.cursor()
cur.execute("CREATE TABLE movie(title, year, score)")
Not sure where dude’s been but there’s a million blog posts on using ChatGPT to do EDA (even OpenAI called it Advanced Data Analysis at one point!). It’s also highly curious that data quality issues come under the header ‘Software’.
Finally, infra. Apparently “once you’ve figured out the ML and software bits, you’ll need cloud infrastructure expertise”.
I mean, the docs for AWS SageMaker and AWS Lambda read equally obtuse to me.
There are startups that are trying to make this better. Modal so you don’t have to read AWS docs. Modular so you can write Python vs CUDA.
Yes, they’re new but come on, man, we’re all trying to make this better. Don’t go off talking about how hard ML is and then go “Want to build your own classifier in just minutes?” (which is behind a login page to boot).
It’s lipstick on a pig. Don’t do this.
I had the privilege of working with Thomson Nguyen at Square, he had the following analogy. A car is a very complicated machine, but to drive it you only need to know a couple of things: steering wheel, gears, pedals. Same thing for ML.
That doesn't mean you stop there. Get started, keep going, and don't let perfect get in the way of being good.
Now let’s go build.
]]>Author’s note: As I write this, I noticed how the previous post had the most views by far in this series. Thank you for your support! Feel free to get in touch here if there’s particular areas you’re curious about, I would love to hear from you!
I had been exploring different generative AI topics and now finding myself wanting to be a bit more focused. I need to have a goal, but what could it be? The last time I got this excited about a technology was probably 10 years ago, when I made the move from Berlin to San Francisco to take a closer look at data science and machine learning.
Now what goal did I have then? I started with Stanford’s Statistical Learning online course (with R!), learned Python via Codecademy and Peter Norvig’s Design of Computer Programs online course, and started attending meetups. In particular, remember attending a Python meetup where one of the contributors to scikit learn shared new features from the most recent version release.
That might have been my first Python meetup, but I wondered to myself, what are the existing features of scikit learn? If it’s a question I’m asking for myself, surely there are others who are asking the same question too.
I put together a presentation that helps you go from 0-to-1 with pandas and scikit-learn. I set up the Kaggle Berlin meetup and presented it there twice. I applied to present at PyCon UK 2014 and got accepted! It was a blast! Plus saying “I’m here to present at a technology conference” was the fastest immigration experience I had entering the UK.
What do I have in mind this time around? It’ll be some variation of getting the audience 0-to-1 with generative AI, more details to come.
I love reading about paradigm shifts, and learning to program was my entry to the previous one. The theme in the mid-2010s is there’s so much data, the ones who are skilled at extracting insight from data will be better to ride the waves of change. IMHO the theme now is we have this new ‘alien technology' that we’re still figuring out how to use it.
That technology is generative AI.
If you’re curious about the title, I translated ‘fake it till you make it’ to German, then back to English under different translation styles and picked my favorite one. I know, right? Amazing!
]]>Say what you will about ChatGPT and the humanities, but it’s been about 10 years since I first picked up Hermann Hesse in Berlin and I’ve never looked at the German text… until now.
I've been watching David Perell on the How Do You Use ChatGPT podcast, where he compares different translations of the Bible and discusses how the source language affects the English translation. I tried this out with one of my favorite quotes from Hesse’s Siddharta, this version from Stanley Applebaum.
“What is it that you've learned, what you're able to do?" "I can think. I can wait. I can fast." "That's everything?" "I believe, that's everything!" "And what's the use of that? For example, the fasting -- what is it good for?" "It is very good, sir. When a person has nothing to eat, fasting is the smartest thing he could do. When, for example, Siddhartha hadn't learned to fast, he would have to accept any kind of service before this day is up, whether it may be with you or wherever, because hunger would force him to do so. But like this, Siddhartha can wait calmly, he knows no impatience, he knows no emergency, for a long time he can allow hunger to besiege him and can laugh about it. This, sir, is what fasting is good for.”
Now in the original German as per Project Gutenberg.
"Was ist es, das du gelernt hast, das du kannst?" "Ich kann denken. Ich kann warten. Ich kann fasten." "Das ist alles?" "Ich glaube, es ist alles!" "Und wozu nützt es? Zum Beispiel das Fasten—wozu ist es gut?" "Es ist sehr gut, Herr. Wenn ein Mensch nichts zu essen hat, so ist Fasten das Allerklügste, was er tun kann. Wenn, zum Beispiel, Siddhartha nicht fasten gelernt hätte, so müßte er heute noch irgendeinen Dienst annehmen, sei es bei dir oder wo immer, denn der Hunger würde ihn dazu zwingen. So aber kann Siddhartha ruhig warten, er kennt keine Ungeduld, er kennt keine Notlage, lange kann er sich vom Hunger belagern lassen und kann dazu lachen. Dazu, Herr, ist Fasten gut."
ChatGPT came up with different translators altogether, here describing each translator’s style.
Hilda Rosner: Rosner's translation would likely focus on clarity and directness, maintaining the simplicity of the original German while making it accessible to English readers. Her translation would read smoothly, capturing the practical wisdom in the passage without overly embellishing the language.Joachim Neugroschel: Neugroschel's translation might delve deeper into the cultural and philosophical context, possibly adding footnotes or explanations to convey the full depth of the original text. His version would be rich and detailed, ensuring that the cultural and philosophical nuances are not lost in translation.Susan Bernofsky: Bernofsky's translation would focus on capturing the lyrical and meditative quality of Hesse's writing. She would aim to reflect the original tone and mood, ensuring the translation feels as poetic and introspective as the German text. Her version would resonate with the philosophical and spiritual elements of the passage.
While ChatGPT wasn’t able to find the actual translations, I got ChatGPT to translate from German to English in the style of each translator.
Hilda Rosner: "It is very good, sir. When a person has nothing to eat, fasting is the wisest thing he can do. For instance, if Siddhartha had not learned to fast, he would have had to take up some service today, whether with you or elsewhere, because hunger would have compelled him. But now, Siddhartha can calmly wait; he knows no impatience, no distress, he can let hunger besiege him for a long time and still smile about it. This, sir, is the value of fasting."Joachim Neugroschel: "It is exceedingly beneficial, sir. If a man has naught to eat, fasting becomes the most prudent action he could undertake. Take Siddhartha, for example; had he not mastered the art of fasting, he would be compelled to accept any form of servitude this very day, be it with you or elsewhere, driven by the coercion of hunger. However, Siddhartha can await events with serenity; he is a stranger to impatience, to dire straits, able to endure the siege of hunger for extended periods, and can regard it with a hint of amusement. Such, sir, is the essence and purpose of fasting."Susan Bernofsky: "It is profoundly good, sir. When one finds oneself without sustenance, fasting emerges as the most sagacious course one might embrace. Consider Siddhartha: had he not been adept in the discipline of fasting, he would find himself compelled to enter into servitude this very day, whether in your employ or elsewhere, as hunger's inexorable demand would dictate. Yet, in his way, Siddhartha can await with tranquility, untouched by the throes of impatience or desperation, able to let the pangs of hunger encircle him for lengthy spans, and meet it all with a trace of laughter. In this lies the true virtue of fasting, sir."
It looks plausible (I quite like Neugroschel's version) but it doesn’t hurt to be a little skeptical. I asked my friend Spencer Strub who teaches writing, and this is what he said:
This shows ChatGPT's strengths -- an impressively thorough digest of the approaches of each translator, and in nice prose, too! -- but that last paragraph also shows why students mess themselves up when they just turn in ChatGPT output: it's not going to tell us which would be best and why. That's a subjective judgment, and ChatGPT rightly doesn't offer it. It's a cool exercise, though...
Thanks for getting this far! In a nutshell, even if there’s room for improvement this is the furthest I’ve gotten into a closed reading of Siddharta despite reading it almost every year. All thanks to a word calculator.
I’ve been trying different things since the start of the year and this following week I’m experimenting with the following focus areas:
On generative AI, I did a speed run of Andrej Karpathy’s Zero to Hero series at my most recent RC batch. I contemplated a re-run of that with exercises, or going through fast.ai courses. I met one of the instructors at Full Stack LLM at the Modal Labs meetup yesterday, took a closer look at the materials and thought this would be a good way to learn generative AI from a product-centric perspective.
On people skills, it’s a recent focus and now switching gears to a group setting. I’ve set up the Staff+ Engineer roundtable at Recurse Center for next week, and excited to discuss a few threads I’ve been pulling: professional coaching, networking, self discovery. In fact, I tried out the exercise from podcast above by pasting in the results of a personality test into ChatGPT and then asking who I work well vs less-well with. I was blown away by the less-well personas, perhaps something I can share at the roundtable if there’s interest.
On communication skills, I start the Ultraspeaking course next week! The Youtube channel has illustrative videos of what we’ll get into. I know myself well enough to know I need a class setting for consistency. Happy to share my findings as the course progresses.
OK so this post has been more about what I plan to do rather than what I did today. I shared David Perell’s podcast with more people, describing it as follows.
In this episode the writer David Perell talks about how deliberately he is at learning about himself and understand what he could be world-class at; in particular where ChatGPT fits in that process.
I replied to a few tweets: here, here and here. I signed up for more events: this one on applied LLMs and this one on sharing outage stories the following week.
The following quotes echo advice on building expertise in Python that I previously shared on this blog; this came up in an RC discussion. Here is David Beazley.
As an aside, sometimes people ask me "what can I do to improve my Python skills?" Much to their surprise, I often suggest doing a project in a completely different language or outside of their area of expertise. I think the main benefit of doing this is that you'll often see a completely different way of thinking about a problem that you can bring home to your own projects.
Next Peter Norvig.
Learn at least a half dozen programming languages. Include one language that emphasizes class abstractions (like Java or C++), one that emphasizes functional abstraction (like Lisp or ML or Haskell), one that supports syntactic abstraction (like Lisp), one that supports declarative specifications (like Prolog or C++ templates), and one that emphasizes parallelism (like Clojure or Go).
Finally Donald Knuth.
If you find that you’re spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you’re spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice.
Never gets old.
All the resources I shared
I read my own posts over the past 2 days and I’m disappointed. No wit, no style, no panache. Such is the reality of forcing yourself to write.
I often start my day responding to posts and sharing resources where relevant. I’ll make a point to add a section below to highlight these.
The notable event today was attending Modal Labs’ meetup. Modal is building a tool to cut down the wait on each iteration loop when deploying to the cloud. Writing code is more fun on dev because you get these fast feedback cycles, and there’s always a bit of ceremony with setting up AWS, configs etc. Modal makes this easy by allowing you run code in the cloud by simply adding a decorator, which is pretty cool.
I’ve been making a point to attend events as a way to ‘widen the aperture’ in my job search. In the past I’ve gone through friends, ex-colleagues and recruiters, but now I’m curious to see what else is out there. I may end up in that direction; at least now I can say I took the chance to explore a bit more. Plus a lot of what’s being developed with generative AI is so new, it’s much easier to know what’s getting traction when you get to see how excited people are talking about it.
I had discovered Full Stack LLM bootcamp before through Andrej Karpathy tweeting about it, yesterday I got to meet someone who created the content. I know, I already have other projects in mind but perhaps it’ll be good homework before the next meetup by Weights + Biases.
I’m only 30 mins in into David Perell’s interview on How Do You Use ChatGPT but I am blown away. Highly recommend watching the first 5 minutes to see if it’s your thing. What’s so cool about it? I’m very impressed how deliberate he is at learning about himself and better understand what he can be world-class at, and I was amazed to see how he uses ChatGPT in that process.
I’m planning to restart the RC event on the Staff Engineer roundtable next week! I’ll end this post with a quote from Cindy Wu.
also one thing my engineering coach said to me that stuck with me that in every situation you have at least 3 options 1/ do nothing 2/ acceptance 3/ direct action. for a long time i didn't see a difference between 1/ and 2/ but now i see that they are very different.i think when i do 1/ i stay grumpy if i am upset about the situation. BUT if i do 2/ then i should feel more at peace with whatever mediocrity or imperfections i see. hope to be able to hone this muscle that helps me find acceptance in the things that we choose not to affect.
All the resources I consumed
All the resources I shared
I wrote yesterday how I didn’t have a project to work on had I did Recurse Center this time around between roles. I realized that it’s only half true. Yes I need to sharpen my programming skills for interviews. However I do have something that I’ve been focusing on: people skills.
What’s different here is (1) it’s not directly related to software, and (2) it takes time. It takes time to come up with the questions to ask, to read up materials, to reflect on the ideas that surface. It’s not “my Raft implementation is working!”. There’s no start and finish. It’s an ongoing process.
It’s easy to fall into the trap of feeling stuck when looking for jobs. I’d like to take the time to explore more this time. Late last year I went to my first networking event since 2015. I look forward to more. I expect to feel out of place, but hey, that’s a chance to grow out of my comfort zone. It’s an opportunity to see what else is out there, what one could serendipitously discover.
In the past I’ve also gotten into the habit of focusing on a single project. I’m going to try having multiple small projects in parallel. Yes, some of them will involve the front end. In fact I played around with Modal as meetup prep to discover I can use it to run Whisper in the cloud (transcribing an hour-long audio clip takes 8 hours locally).
I didn’t complete Advent of Code Day 12. What’s interesting is that I write code that gives a closer representation of the problem, rather than a simplification that helps get the answer. For example, I would generate all permutations of a string, instead of counting how many there are. Is this brute force? Sort of, though even after optimizing string permutation it's still too slow. I suppose it’s slower to handle 1 million strings instead of a counter at 1 million.
]]>I try to write blog posts the next day, but I’m writing this on the next next day because I’m stuck on Advent of Code Day 12. In fact I believe this is the first day that I didn’t complete both parts of the puzzle on the day itself.
I started reading interviewing.io posts on the job market (on 2024 and #opentwork) and felt demoralized.
Next I read my blog posts from my most recent batch at Recurse Center (yes this is why you should blog / write - for yourself to look back on). I wondered why I didn’t do RC this time and realized I can’t think of a specific project I want to spend time being heads down on. Reading through past posts, I felt much better coming across this post with the following quote.
One of my favorite phrases picked up in the past two years is the Shangaan phrase Hi Ta Xi Uma, which I learned from Renias Mhlongo, one of the top trackers in Africa. It means “we will find it,” and Renias will keep muttering it when he loses his track and struggles to find the next one. Everything is hard, and usually much harder than we can fathom. All the best people I’ve met through the podcast just don’t let that stop them. They also seem to develop an awareness of this constant difficulty and just become used to it.We will find it.]]>
I started doing a bit more background reading for an event I'll be heading to later in the week, organized by Modal. Erik Bernhardsson founded Modal, and while I absolutely love his post on building data teams, the quote is from a post about Modal itself.
I spent a lot of time as a CTO, and if I had to condense how to make engineers productive into one thing, it would be something like: make the feedback loops fast.
Another interesting thread in the post is data as a new engineering discipline, and how the 'book of best practices' is still currently being written. Related to this is machine learning engineering; both are relatively new and enjoying a resurgence on the back of the popularity of generative AI.
]]>We drove up to Point Reyes today, so another sparse post.
I completed Advent of Code Day 6. The instructions clearly refer to a hypothetical Camel Cards game, but I implemented a poker hand strength evaluator which worked for the example which didn't work for the puzzle input. I went down a rabbit hole trying to debug Part 1, and I was not in a good mood when I started working on Part 2 (especially since I had to wait after the trip).
Cindy Wu and I co-organized an RC meetup in SF! It was great seeing people at the start of the New Year. I made a note to listen to Avery Burke and Stevie Hryciw being featured in the podcast Topic Lords.
I did get to listen to Will Larson's interview on Lenny's Podcast. This quote stood out to me.
People create simplistic narratives to find villains that they work with. There are no villains in the workplace. They're just people with complex incentives that are doing complex things.]]>
I didn't take a lot of notes about what I did today and I'm tad behind so it'll be a sparse post.
Advent of Code Day 6 was comparatively more straightforward than other days. I spent a bit of time getting ChatGPT to adding docstrings to my code so far, but had to emphasize that changes are to be made to add comments only and not to the code (I'm rather partial to having newlines).
I watched Geoffrey Litt on Dan Shipper's How do you use ChatGPT podcast series, where they built an app from scratch during the podcast. My front end isn't great so I'm looking forward to replicating what they did. I looked through Dan Shipper's Twitter feed and got to share my love for Thai food.
Author's note: If you're wondering where the quote came from, I got ChatGPT to generate quotes that describe when brute force no longer works. The other one I quite like is 'Brute force is the last resort of the incompetent', allegedly inspired by Isaac Asimov.
In the previous post, I mentioned how I looked up all the talks by Bryan Cantrill but omitted how his talk on Docker / containers was the one that inspired it. Other notable talks include the golden age of hardware / software co-design and Monktoberfest 2022 (which I highly recommend if you’re a parent).
The other thing that I didn’t mention was attending the ML Applied Projects event. I wanted to get started on 2024 ideas, among them having a local copy of my ChatGPT discussions. During the event I was told these are available to download! I don’t recall seeing this option last year, but I guess that task is done.
The notable event today was the Zig meetup at Bun HQ. It's often refreshing to chat to people in person and comforting to realize if you feel awkward at events like these, others likely feel the same way too. Jarred Sumner went through cool string optimization stuff that Bun does (for example storing details in the 12 bits not needed in an 8-byte address because the system only needs 52 bits), which is easy in Zig because Zig doesn’t have a string type.
Advent of Code Day 5 was the first day my code that works for the example wouldn’t work for the puzzle input. For the first part, the solution involved mapping integers to integers and my lazy implementation generated all the possibilities. The puzzle input made the dictionaries huge (though in my case strangely didn’t OOM), so I converted the dictionaries to be more compact by storing only the values needed to make the translation.
For the second part, we needed to run different ‘seed’ values through multiple integer-to-integer mappings to find the minimum final output. The brute force implementation would have taken a long time. The fix involved two changes (1) finding the minimum contiguous range across all lookups to know how many you can skip (for example, if you know 50 maps to 10 and 55 maps to 15 then you just need to keep track of 10), and (2) fill the gaps in the dictionary so you can do the first change more effectively. The times I measured:
w/ step 1 : not done even after ~4 hours
w/ step 1 : done after ~2 hours
w/ steps 1 + 2: done after ~4 mins
All the things:
]]>As I'm writing this, I'm letting my brute force implementation of Advent of Code Day 5 run free. My estimate was 3 hours, I'm time-boxing writing this post to 30 minutes and let's see where we are when I finish writing this post (edit: 30 minutes it looks like I'll need 3 more hours, so new estimate is 3.5 hours).
Advent of Code Day 4 was definitely more straightforward than Day 24. I got to re-use the scanner I built for Day 24, which is always satisfying.
I watched Why static typing came back on the back of Bob Nystrom's tweet, where he wrote loved it (in emphasis). I really enjoyed the talk, so much so that I'm keen to find other talks by Richard Feldman (last time this happened was probably Bryan Cantrill) and even opened up Roc's website.
In a nutshell, the talk described how (1) dynamic languages exploded when the Internet became a thing and being slow wasn't terrible because the modem was even slower, (2) dynamic languages introduced a number of features that made it more fun to code in plus you get faster iteration cycles, but (3) these features were introduced by dynamic languages but nothing stopping static languages adopting it, so with fast compilers and LSP / IDEs static languages can get the best of both worlds (but the reverse is not necessarily true).
Later in the afternoon I realized I'll be going to a Zig meetup the next day but haven't played around that much with Zig. I watched Andrew Kelley's Intro to Zig but had a hard time concentrating. I noted how I really like one of the goals of Zig (namely "Raise the standards of software as a craft throughout the industry") and the quote below (which made me think about Ben Kuhn's In defense of blub studies), but skipped to other videos around the 15 minute mark.
Think about colleagues that you have and you can probably rank them in your mind about which ones are more capable and which ones aren’t. There’s a pattern that I noticed and sometimes you think that it’s the person with more years of experience that is more capable but I’ve found that it’s actually just the person who’s willing to dive deep the most and learn the most about the system that they’re working with who actually ends up being the more capable person.
More 'popular' videos I watched instead were Zig in 100 seconds and Initial thoughts on Zig. My takeaways were custom allocators and comptime are unique to Zig but I'll need more homework to explain this well.
The Zig's new relationship with LLVM post I found really interesting even if the later parts went over my head. I've been curious about Mojo and the post illuminated considerations and trade-offs around language toolchains. What's really cool is making LLVM optional allows Zig to be developed fully in Zig, faster compilation and allows for in-place binary patching.
]]>I wanted to work on Advent of Code Day 4, so I clicked on the link above Day 3 thinking it's the next problem. I read through the problem to learn that I needed to code up a linear solver. Yes I did study math in college but that's a while back.
Inverting a 2x2 matrix wasn't hard, but I tried replicating the example given and always ended up with the wrong answer. It turns out that I calculated the determinant of the matrix, but did a multiply instead of a division.
I completed Part 1 and realized the problems weren't sorted in order - above Day 3 was actually Day 24!
Oh well.
For this problem I invested a bit more time writing a scanner. This way parsing the text needed to be done only once instead of multiple times when using `.split()` (even though from a performance perspective it probably doesn't move the needle).
What's quite nice is using Github Actions as a test runner. I previously did this when experimenting with ChatGPT to improve my code. The small change I did this time around was to use `ruff` instead of `black`, but no better reason besides that's what I kept seeing the cool kids use.
]]>I wanted to switch gears into something more fun and realized I could get started on Advent of Code.
The Day 1 problems didn't feel as lightweight as what I would imagine Day 1 problems to be. I was tempted to have a scanner that worked both ways (since you want the first and last number in a string, number defined as '1' or 'one'), but realized after taking a break that it's more of a premature optimization.
The plan was to code in Python and then port the slow parts to Mojo. Jen Hsin shared a very well-written Advent of Mojo series of posts, which on a quick review shared how "68,000x speed up is a bit ambitious here since you're not parallelizing lots and lots of matrix multiplies". What's particularly helpful is one of the posts in the series talks about usability, which would be a good post to start as a newbie.
I realized there was an open question on Mojo from the day before, so got around to take a closer look.
Prompt: I was listening to a talk on Mojo and came across the notion that GPUs are optimized for matrix multiplies but not necessarily for operations relating to the transformer architecture. Is this really the case?]]>Response: The statement you heard about GPUs being optimized for matrix multiplications but not necessarily for operations specific to the transformer architecture is partially true, and it relates to how GPUs are designed and how transformer models work.GPU Optimization for Matrix Multiplications: GPUs are highly efficient at performing matrix multiplications, which are a core operation in many deep learning tasks. This efficiency is due to their parallel processing capabilities and architecture, which is well-suited for operations that can be broken down into smaller, simultaneous tasks – like matrix multiplications.
Transformers and Matrix Multiplications: Transformer models, popularized by architectures like GPT (Generative Pretrained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), rely heavily on matrix multiplications. These are part of the attention mechanisms and feed-forward networks within the transformer layers. GPUs are well-suited for these aspects of transformers.
Other Operations in Transformers: However, transformers also involve other types of operations that are not purely matrix multiplications. For example, operations in the attention mechanism like the calculation of attention scores (involving softmax functions) or certain element-wise operations might not be as optimally performed on GPUs as matrix multiplications. While GPUs can certainly handle these tasks, they are not as inherently optimized for them as they are for matrix multiplications.
In summary, while GPUs are indeed optimized for matrix multiplications and perform well for many operations in transformer architectures, they are not necessarily optimized for every specific operation within those architectures. This has led to an interest in developing hardware that can better cater to the unique demands of transformer models. Emerging Hardware for Transformers: Recognizing the growing importance of transformer models, there's ongoing research and development in creating hardware that's more specifically optimized for the needs of these models. This includes both modifications to GPU architectures and the development of specialized processors (like TPUs - Tensor Processing Units by Google) that can handle the mix of operations in transformers more efficiently.