RC W3D1 - Fake it till you make it

On faking it

According to my Github profile, I'm an Arctic Code Contributor. Github has decided that one of my repositories is worthy of preserving for future generations, alongside the illustrious source code for Python, Go, Rust, Linux and React. My code will live in a "very-long-term archive designed to last at least 1,000 years".

The repo in question is a set of notebooks created to help people get started with pandas and scikit-learn, i.e. Python libraries for data manipulation and machine learning. I've come across references to it in French, Chinese and Korean.

At this point you might think I'm bragging. The funny thing is it all started as a bit of a ruse.

I wanted to get a job in data science. I thought presenting at a conference would be a selling point. I created a Meetup group. I hosted a social to get members. I got the members to attend my presentations. I used the presentations to get a speaker's slot at PyCon UK.

Hence the title for today's blog post. We all had a start somewhere, I'm not sure how I feel that's what's being remembered. I guess I have to work even harder now...

Open source

A thought that did not sit well over the weekend was my previous claim that open source is not as inaccessible as I initially thought. This holds true, but incomplete. My reference to the gap between generating Fibonacci and production-grade code alludes to this, but I hesitated to publish what I initially drafted. I was worried what I said would be seen as a discouragement.

It helps having worked in a production environment; some may need a bit of hand-holding to feel more comfortable with, say, the build-test-style suite. That said, I think this hand-holding process can be a helpful how-to guide. I'll ponder on this in the coming weeks.

Content: External validation

In that same post I also mentioned my fear of exposure, when in reality people tend to care too little than too much. The content for today, a Tim Urban post titled Taming the Mammoth, illustrates this beautifully.

https://waitbutwhy.com/2014/06/taming-mammoth-let-peoples-opinions-run-life.html

Being approved of by one type of person means turning another off. So obsessing over fitting in with any one group is illogical, especially if that group isn’t really who you are. You’ll do all that work, and meanwhile, your actual favorite people are off being friends with each other somewhere else.

RC W3D2 - Choose boring technology

Rust

I've been spending the start of the week learning Rust. I opted for O'Reilly's Programming Rust initially as it closely follows UPenn's CIS198 (problem sets FTW), but has since switched to the Rust book as I'm finding the style a bit more fun.

Content: Innovation tokens

I knew Dan McKinley's Choose Boring Technology post would feature on my content list. I especially love the framing of innovation tokens, and how you have to choose your battles.

https://mcfunley.com/choose-boring-technology

What caught me by surprise re-reading it today is how it feels less clear what's considered as spending those tokens. NodeJS is widely used, and I found out recently Stripe uses MongoDB for core parts of its key-value infrastructure (cue "is your database web scale" jokes).

My experience with the supposedly battle-tested Airflow hasn't been positive. We employed a number of workarounds to get it to play nice for long-running (~8 hours) ML model training on containers. We ended up partly moving to a scheduler built in-house (for model training jobs), before moving wholesale (plus ETLs) to Argo. Sure Argo is built for k8s, but the sense with Airflow was "it shouldn't be this painful"...

It's hard to disagree with the principle, as per the post, that one should always consider cost-benefit trade-offs carefully. That said it does seem a lot simpler in the abstract. Is Rust sufficiently boring?

RC W3D3 - A tool for every trade

Python

I've been spending a fair bit of time learning new programming languages recently, and I'm quite happy to have the progression be Python to Go to Rust. When learning something new, your motivation gets a boost when you can do lots of things and do it fast. From this perspective, it's hard to beat an interpreted language like Python.

Python is the main language at my previous role, and this choice makes sense given how key ML models were to the business model. Having the data science and engineering share the same language helped reduce maintenance and tooling overheads, as well as provide better context in handoffs.

The drawbacks were speed and lack of types. For speed, we had bindings to C++ on the hot path - the need for speed in an auction setting is clear. For types, we gradually introduced type comments with mypy. Why types? It's easy to introduce bugs when refactoring a codebase with limited guard rails; types are like tests that you get for free.

Go

I had the chance to learn a bit of Go just before RC, motivated partly because it's the main language for Bradfield's CS Intensive course. I was pleasantly surprised to find a lot of tooling we used for Python comes built-in in Go (brief discussion here). Go also has first class support for concurrency; the language is a popular choice for servers given the need to support multiple clients simultaneously. The syntax was surprisingly easy to pick up coming from Python.

Rust

I wanted to know more about the front end at RC, but curiously got into WebAssembly in the first week. I then found out from Tom how Rust has the best from-scratch support for compiling to WebAssembly. Another reason Rust works better is the smaller runtime - the smallest achievable binary size uncompressed from Go is ~2 MB vs ~2 KB for Rust. I'm not as familiar with the finer details, perhaps a sizeable part of that can be attributed to garbage collection in Go.

In summary, Python for fast prototyping and 'glue code', Go for concurrency and Rust for the low level stuff. Next comes lots and lots of practice.

Content: Disruptive innovation

If you've come across the theory of disruptive innovation, Jill Lepore offers an interesting and persuasive take in the New Yorker.

https://www.newyorker.com/magazine/2014/06/23/the-disruption-machine

Love how there's a reference to HBO's Silicon Valley.

RC W3D4 - Enabling business models with technology

WebAssembly

Every Thursday there's a Front End Hack and Tell - we commit to working on something front end-related at the start of the session, and circle back at the end to share what we came up with. I've been learning Rust to get into WebAssembly, but so far still ramping up on the Rust part. I learned MIPS Assembly a while ago, and thought a refresher might help me approach WebAssembly from a different angle.

This didn't quite work out as planned. The notes I have from that time assumed I'd be revising the material, as opposed to picking it up practically from scratch. Plus WebAssembly is a stack machine, not a register-based one like MIPS.

C

I spent the rest of the hacking time on Rust, but my refresher reminded me of the story of how C came about. Assembly can be thought of a set of instructions that the processor understands. For example, `add $t1, $t2, $t3` means add the values in $t2 and $t3 then place it in $t1. The tricky bit is each processor has its own instruction set, so you'll need to compose one set of instructions for a MIPS processor and a different set for an Intel x86.

The idea behind C is a language that's one layer up from the processor (so it works across both), but also sufficiently low level so things run fast. What did the creators of C do once they're done? They created Unix.

Content: Design

Having featured data science and product management, let's talk about design. First, a preamble.

Square allowed every person who had a smartphone the ability to accept credit cards. The problem was, now every fraudster could use Square to cash out stolen credit cards. Square solved this problem with machine learning. By leveraging the improved ability of ML models to flag suspicious payments, ability to accept credit cards is made accessible.

If Square is an example of a business model that becomes viable (amongst other things) on the back of machine learning, can we think about the analogue for, say, WebAssembly? Here Figma comes to mind - WebAssembly allows the functionality of the Adobe design suite to be run in the browser, but with the performance of a native app.

https://kwokchain.com/2020/06/19/why-figma-wins

What I especially loved with the post is the framing of tighter iteration cycles. Before Figma, the design process involved back-and-forth between teams via e-mail. With Figma, all design and feedback are stored centrally in the cloud, allowing more context to be retained at handoffs (analogous to the same language across teams in a previous post). I must have overlooked the reference to other cool technologies deployed on the prior reading - WebGL and CRDTs.

In summary, I learned a lot working as a data scientist in a company where machine learning makes or breaks the business. I'm curious to see how new business models becomes viable with something like WebAssembly. Though first, one needs to understand WebAssembly...

RC W3D5 - Reflecting on experience

Go

I've been spending the week on reading and writing code. I needed a breather, and made it a touch lighter today with blog posts and Youtube videos.

I found this post by Discord describing how they moved one of their services from Go to Rust. The engineering team discovered latency spikes from garbage collection, how performance issues persisted after tuning, and then decided to make the switch. Jon Gjengset in this video described how in Go, "even though concurrency is very easy, that concurrency is very easy to shoot yourself in the foot with".

Rust

I'm a believer that there's a tool for every trade, and keen to develop the sense where one would choose one tool vs another. I didn't pick this up in my earlier readings, but found out that creating a doubly-linked list in Rust is non-trivial.

Content: Farouk al-Kasim

I re-read my favourite articles before sharing them on here, some really do make me pause to reflect. This is a story of Farouk al-Kasim, who left Iraq months before the Ba'ath party took power and then prevented Norway from squandering its new-found oil wealth (enclosed below).

Content: Tango pour Claude

I remember hearing this and being very, very happy I asked for the name of the piece.

Content: BBoy Cloud

A little random I know, but why not when you have this much style.

Content: Siddharta

Re: self-reflection, I have to include Hermann Hesse's Siddharta. I love how the book highlights the importance of experience. It's tricky choosing just a single quote, but this one ties in well with being at RC to learn (or perhaps, learning through the experience of what works and what doesn't).

Has any samana or any Brahmin ever feared that someone might come and grab him and rob him of his learning and his piety and his profundity? No, for they are his own, and he gives of them only what he wishes to give and to whom he wishes to give. It is the same, exactly the same, with Kamala, and with the joys of love. Red and beautiful are Kamala’s lips, but try to kiss them against Kamala’s will, and you will not get a drop of sweetness from the lips that know how to give so much sweetness! You learn easily, Siddhartha, then learn this too: One can get love by begging, by buying, by receiving it as a gift, by finding it in the street, but one cannot steal it.



RC W4D1 - Maximize your learning

On exposure

Every Friday I would look back at the past week, but would be too wiped out to articulate it clearly. This occurrence was especially annoying last week, as it marked a quarter into the batch.

I've covered a number of different areas in the first three weeks. At a coffee chat today I realized the time so far had not reinforced my interest in any particularly, but instead exposed me to a whole new world of things. Suppose I had three choices before, how strongly I feel about each of them hasn't really changed. However I now know there are seven choices.

Rust

I'm very much enjoying Rust. I haven't done it for long so I won't say too much, but it appears the compile times can take a while. That being said, a toy implementation of generating the 10,000th prime number averages at 0.25 seconds for Rust vs 3.88 seconds for Python (code here, inspired by post here).

Over the weekend I discovered RustPython, a Python interpreter in Rust. Since Rust can be compiled to WebAssembly, the interpreter can be made to run in the browser. The FOSDEM 2019 talk can be found here.

Content: Learning at startups

I've always believed in choosing a role that allows you to maximize your learning. In the context of startups, Paul Buchheit says it best.

https://triplebyte.com/blog/interview-with-gmail-creator-and-y-combinator-partner-paul-buchheit

I would suggest thinking about joining a startup as more like going to grad school to learn. Optimize around learning when choosing a job. That’s the best thing. Then if a startup fails, you can always go back to Google and probably get paid a lot more, because now you’re actually a much better engineer than you would have been if you had stayed there like everyone else.

RC W4D2 - Let's try something different

On writing

It's September 1, let's try something different. Problem - I'm usually wiped out at the end of the day to be articulate (or at least, articulate as I'd like to be). Solution - write notes throughout the day on what I plan to write about.

As an aside, my blog posts are paying off already - I referred to a previous post on the simplest way to compile Rust to WebAssembly, instead of my notes.

WebAssembly

I know. I've been going on about WebAssembly for a while now. Part of it is accountability. It's an incentive for me to learn something I keep going on about to avoid embarrassment. I do believe the responsiveness that WebAssembly provides would accelerate the shift of native apps to be run in the browser. Yesterday I realized it could be even more groundbreaking.

Sara shared what's she's reading on her blog, it's a fantastic list. The post by Andreas Rossberg on Motoko had a link to a comprehensive discussion of WebAssembly in the Communications to the ACM (direct link here), as well as the following quote.

Wasm’s main difference compared to other virtual machines is that it is not optimized for any specific programming language but merely abstracts the underlying hardware, with a byte code directly corresponding to the instructions and memory model of modern CPUs. On top of that, Wasm supports sandboxing through strong modularity and a rigid mathematical specification that ensures that execution is safe, free of undefined behaviour, and (almost) entirely deterministic. Moreover, these properties actually have a machine-verified mathematical proof!

This is in addition to Solomon Hykes, the CTO of Docker, tweeting how had WebAssembly existed in 2008, there would have been no need for Docker. Now I'm puzzled. Why isn't there more hype?

What I can say is the learning resources seem sparse. Perhaps due to things changing quickly. I had been trying to tweak a number of Hello World examples without much success, along the way discovering the myriad of tools available. I gave up and decided to work on the Programming WebAssembly with Rust book instead. The book uses wasm-bindgen for Rust-JavaScript interoperability, but it's not immediately clear how using wasm-pack in addition would help.

I've decided I'm going to come up with my own resources. Watch this space.

The advantage of going through a book is comprehensiveness. In the book I found the answer to a previous question posed - why do you need a web server to run WebAssembly? It turns out cross-scripting rules in the browser blocks reads to the file system.

Re: portability, when I compile Rust on my laptop, the base target is stable-x86_64-apple-darwin, i.e. version-processor-OS. For WebAssembly, it's wasm32-unknown-unknown, i.e. it's designed to be compile once, run everywhere!

Java

Last week I briefly mentioned portability of C, let's complete that thread. C binaries would only work for a specific OS, so C when compiled for Linux would not work on Windows. Java, however, is compiled to Java bytecode that would run on any Java VM by design. Thus Java achieves portability across operating systems, which extends to all JVM-based languages - Clojure, Groovy, Kotlin, Scala. It's the original compile once, run everywhere.

Content: Python interpreters

Coming across RustPython made me look up Allison Kaptur's post on 500 lines or less: A Python interpreter written in Python.

http://www.aosabook.org/en/500L/a-python-interpreter-written-in-python.html

I then discovered this is something she worked on while at RC!

RC W4D3 - L'esprit de l'escalier

On perfect replies

Today RC hosted Remotehost, a virtual technical talk series (and the remote version of Localhost). The theme was virtual spaces, the demos were really cool. At the chat roulette session afterwards I was paired with Mai. I had shared some feedback with Mai earlier in the week, and I mentioned how my feedback was even better articulated when I described it to someone else after.

This is such a common occurrence that there's a term for it - l'esprit de l'escalier. As per Wikipedia, it's the predicament of thinking of the perfect reply too late. Curiously, the opposite of touché.

CSS

I was paired with Julia Evans next, who described a recent focus on CSS and new features like Flexbox and Grid (also shared on Twitter here). I brought this up at the Nix OS event later in the day, to discover the latest version of CSS is Turing complete. Mind blown!

Swift

I've my hands full on new languages, but somehow keep coming across Swift and Julia (no relation) over the past week. Fun fact: Rust came out of a personal project by Graydon Hoare when he was at Mozilla, he left the project and later on worked on Swift (both are LLVM-compiled).

Julia

I first came across Julia in 2015. There was optimism on how the language would take over Python for scientific computation. It's not clear to me how much this is the case, if anything Python has grown a lot and is even taking on Excel use cases. Julia is 1-indexed; this doesn't seem like a great choice to boost adoption (adds to context switching) but making note to self to look up why this decision was made.

Content: Jeff Dean and Sanjay Ghemawat

I thought I was past hero worship. It's hard not to when it's about these two.

https://www.newyorker.com/magazine/2018/12/10/the-friendship-that-made-google-huge

Didn't think I'd see this in the New Yorker. Love the prose describing systems.

RC W4D4 - The only intro you'll need

WebAssembly

I was preparing slides on WebAssembly today when I came across Lin Clark's cartoon intros. It's spectacular. I wished it was the first thing I read on the topic; it appears l'esprit de l'escalier is a thing for content too.

The background articles also provide helpful context and an easy read.

It's amazing how Google, Mozilla, Apple and Microsoft actually got together and agreed on the specs. The project has now expanded beyond the browser, with wasmtime as an independent runtime and WASI as the unified interface. In addition to its own foundation, called the Bytecode Alliance.

I do wonder, more broadly, if the technology will further consolidate the dominance that large tech companies have, or will a thousand startups bloom? Will there be a lot of end users who like it but only few love? What will the killer app be (or will there actually be one)? I was late to notice iPhones and bitcoin as platforms. I'm ecstatic at being able to follow the WebAssembly life cycle from an early stage, and see where it goes from here.

Julia

I woke up in the middle of the night, had trouble going back to sleep and actually looked up why Julia is 1-indexed (or rather, why most languages are 0-indexed). I came across this post which had the following quote.

So: the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware...

Intuitively it sort of make sense - if you start from zero it's a no-op vs having to do an `add immediate`. Fascinating.

Content: Dev Ops

This week's feature is Increment's post on cloud migration; it fits in nicely with a friend joining Stripe this week. I enjoyed reading how Netflix introduced planned instance failures so the on-call team can to deal with it during business hours. What's particularly impressive is the decision to implement this workflow at a time of rapid growth, forcing adoption of industry best practice at the same time as battling other fires.

https://increment.com/cloud/case-studies-in-cloud-migration

A hat tip to the marketing team who coined the term 'chaos engineering' from 'chaos monkeys'.

RC W4D5 - The ride of a lifetime

Kubernetes

I had a coffee chat with Sophia today, we got talking about Kubernetes, and I shared this comic that actually explains pretty well what the 'open source container orchestration system'.

I find it interesting how polished Google's more recent open source efforts are, with serious marketing budgets. Another project that had a huge launch (though not as much fanfare these days) was Tensorflow. I wonder if this was the lesson from having come up with MapReduce, but see the technology popularized as Hadoop instead?

Content: International Committee of the Red Cross

This article came to the top of the list on what next to feature. It highlights the delicate balance between maintaining confidentiality to help those in need, against exposing the morally-corrupt but risking access in future missions (enclosed below).

What I didn't pick up as prominently on the first reading was how the founder of Médecins Sans Frontières came out of ICRC, with the goal of creating an organization that combined relief with advocacy.

Content: Life in Mono

I can't remember where I first heard this. How apt given the song sets a wistful mood.

Content: Yassin Falafel

OK I'm going to gush about Square again, though it's a timely refrain of the theme of adapting to an adopted homeland (Farouk al-Kasim of a previous post).

Seriously, though, I was blown away when I watched this.

Content: The Ride of a Lifetime

The first book I listened to on Audible was Bob Iger's biography. In general I've not been a fan of this genre; biographies I've read in the past have narratives that go on and on about how this person was destined to be successful, all the stars align. In this one, I recall he worked hard and treated people with respect. I remember an honest and candid retelling of his rise to the top, where in no way was the path assured.

The Ride of a Lifetime and Shoe Dog - I'd love to find more books like these. I recorded this exact quote on my phone so I could replay it again and again.

A company’s culture is shaped by a lot of things, but this is one of the most important - you have to convey your priorities clearly and repeatedly. In my experience, it’s what separates great managers from the rest. If leaders don’t articulate their priorities clearly, then the people around them don’t know what their own priorities should be. Time and energy and capital get wasted. People in your organization suffer unnecessary anxiety because they don’t know what they should be focused on. Inefficiency sets in, frustration builds up, morale sinks.

You can do a lot for the morale of the people around you (and therefore the people around them) just by taking the guesswork out of their day-to-day life.