W6D3 - Gradient descent can write better code that you - Experiments in managing perfectionism

Spending time at Recurse Center between roles is a great opportunity to think a bit deeper about what I’d like to do next. This is usually through chat with fellow RCs in batch, and it’s always fun to hear how others are going about their search.

I didn't do RC this time around, so that ‘refinement’ process has to be done on the go with real interviews. An interesting question I was asked was “what’s special about having machine learning in your system?”.

I thought about my time at Square and what came to mind was this idea of moving from a deterministic system to a probabilistic one. Suppose a seller owes Square $100, and Square initiates a $100 ACH debit to recover those funds. It takes 3 business days before you know if the debit succeeds. Now the next day the seller needs to be credited $50. What do you do?

In the optimistic case, you assume that the debit will succeed so you let the $50 go through. In the pessimistic case, you assume that the debit will fail so you hold the $50 for two business days. Which one is the right call?

The flow with ML is to train a model based on historical data. If the seller looks closer to one where the debit would succeed, then let the $50 go through. If the seller looks closer to one where the debit would fail, then hold the $50. Hence we switch from making a binary decision, to having a threshold that determines how we would act.

Of course, there is also the question of whether our historical data is sufficient for model training, and if it is, the return on investment on extra complexity is worth it.

Andrej Karpathy takes this a step further in his post Software 2.0, jokingly describing it as follows.

Gradient descent can write better code that you. I’m sorry.

This was written in 2017, and perhaps even more pertinent now.