There's a weekly book club at RC on Martin Kleppmann's Designing Data-Intensive Applications. Recent discussions reminded me of things I learned at Bradfield; it's been a fun process of rediscovery.
The first is on how caching affects memory access. Let's start with Moore's law. In the 1960s, everything was slow. Processors got a speedup in over the 70s and the 80s, memory access too but not to the same degree. By the 90s we have a real gap - it's fast to process data but slow to access data, say by a factor of 100.
How did we bridge this gap? Caching. The idea is if it's slow to go all the way to RAM for data, let's move it a bit closer. This was achieved with a piece of dedicated hardware called the cache. Thus we have a 'memory hierarchy' - first registers, next cache, then RAM. Estimates of access times can be found here.
The cache retrieves data a bit like how we get library books. Instead of getting one book at a time, we would get a couple of books at one go. The cache's analogy to the number of books is what's called a cache line. For simplicity let's just say that 64 bytes of data is retrieved at a time and stored on the cache.
Thoughts about caching came up when we were discussing database indexing. We can index a database with binary search trees, but this is inefficient because the node of a binary tree occupies only a fraction of the cache line. This means there'll be more cache misses (when the data we're looking for is not in the cache), compared to cache hits.
B+ trees remedy this by increasing the branching factor, i.e. the number of references to child nodes in each node of the tree. This ensures the whole cache line is used up. In practice the branching factor depends on the amount of space required to store node references and the range boundaries. A toy implementation of a B+ tree in Python can be found here.
Encoding
The second is on encoding. Computers store data in binary, i.e. a collection of 1s and 0s, but binary data by itself doesn't tell us what it means. We could think about it as a series of Booleans. We could also 'chunk' the binary data 8 at a time and have each chunk represent a character. This encoding is called ascii, and can be found by typing `man ascii` in the Terminal. Each ascii character is one byte i.e. 8 bits.
Suppose we were to send integer values to each other, say the number 1000. We could do it in 'text' format, where each character takes up one byte. This means we'll need 4 bytes for "1000". Alternatively we could also send it in 'binary' format, where 4 bytes can range from -2,147,483,648 to 2,147,483,647 (i.e. the range of signed integers given 2^32). In other words, sending integers in binary format is a lot more compact.
Thoughts about encoding came up when we were discussing data encoding formats. A commonly-used format is JSON, which is all text. This keeps things simple but the lack of compactness means it'll be slower to send over the wire. Protocol Buffers, commonly-known as protobuf, allows integers to be encoded in binary format and thus more compact over the wire. This makes a real difference not just for integers but also for binary attachments, like images.
A simple binary encoding is Bencode, which is the encoding used by BitTorrent. Bencode supports strings, integers, lists and dictionaries. Alas it's not that widely used, despite being a "human-readable" binary encoding format. A toy implementation of Bencode can be found here.
Functional programming
It's hard to miss the excitement around Haskell at RC. A few weeks back I told myself I'd like to spend a bit of time on it at my next RC to see what the hype is about. Then I found myself at a stopping point on other projects, and conveniently had Haskell lined up.
Like most things in life, plans only go so far. I came across category theory while reviewing RC discussions on Haskell. I watched the first video in Bartosz Milewski's Category Theory for Programmers series; I was hooked.
I'll admit part of it is Milewski's infectious enthusiasm on the subject. In the preface of his book, he describes how category theory is a "treasure trove of extremely useful programming ideas", which inspired a lot of ideas in Haskell. It's an area of math that's unlike calculus of algebra, because instead of dealing with particulars, it deals with structure. It deals with the kind of structure that makes programs composable.
What I also found quite interesting in Milewski's thread on the 'rise' of functional programming. For a long time we were happy with imperative programming, and use object-oriented programming as the paradigm for abstraction by hiding complexity (in particular, hiding state) inside the object.
The 'end' of Moore's law has since brought multicore processors (on this topic, Sophie Wilson has an excellent talk here). However, imperative programming allows for side effects and hiding state in objects makes data races more likely. While we can carry on doing what we've been doing thus far, it'll be an ongoing struggle in this new concurrent world.
Why needlessly endure complexity? Why not simply shift to functional programming paradigm, where objects are immutable and functions are pure? Instead of building on shaky ground, Milewski's rallying cry is to fix the foundations in order for us to all more effortlessly move forward.
Content: The moral bucket list
David Brooks disarmed me with "I’m a pundit, more or less paid to appear smarter and better than I really am", yet this essay makes me pause to reflect every time.
When I meet such a person it brightens my whole day. But I confess I often have a sadder thought: It occurs to me that I’ve achieved a decent level of career success, but I have not achieved that. I have not achieved that generosity of spirit, or that depth of character.
Content: How to interview your interviewers
I've started doing interviews recently. Thinking about interviews as a chat to get to know potential colleagues can help take the edge off.
That’s because finance, like other forms of human behavior, underwent a change in the twentieth century, a shift equivalent to the emergence of modernism in the arts—a break with common sense, a turn toward self-referentiality and abstraction and notions that couldn’t be explained in workaday English.
In poetry, this moment took place with the publication of “The Waste Land.” In classical music, it was, perhaps, the première of “The Rite of Spring.” Jazz, dance, architecture, painting—all had comparable moments. The moment in finance came in 1973, with the publication of a paper in the Journal of Political Economy titled “The Pricing of Options and Corporate Liabilities,” by Fischer Black and Myron Scholes.