If you like this presentation – show it...
I’m Dan McKinley. That’s me in the hole. It’s a metaphor.
I work for a company called Stripe. Before that, I was an early employee at Etsy, where I worked for a lot of years. I acquired a great deal of practical experience at Etsy so I’m going to be referring to my time there a lot.
Etsy wasn’t mature as an engineering organization when I got there, but I was eventually spoiled rotten when it came to technology culture at Etsy. As I’ve gone back into the wider tech world, I’ve had to confront some questions I hadn’t really considered in a few years. And I’ve realized I have opinions about these things. That’s what this talk is about.
So how do you choose technology? This was eventually more or less handled for me at Etsy. Now I need to worry about it again.
You can achieve anything with software. And I definitely believe that companies don’t usually succeed or fail because of specific technical choices. But technology choices are relevant. They affect how straight the path is between you and achieving your goals. They affect your efficiency.
Another question that I care about is: how do you make developers happy? This matters to me, as a developer. But also as a leader in a software organization—productivity and retention rely on this.
If you ask developers, a lot of them will tell you they’re happy when they’re working with technology that’s exciting. Or another thing they say is that they like to work on hard problems. Those things may or may not be true. But what I’ve learned with experience is that it’s not really the case at the highest levels of fulfillment. They don’t talk about nodejs in heaven.
You can probably tell from the title of this talk that I don’t think that chasing shiny technology is right. But look, I’ve been there. I, too, once chased shiny technology.
Etsy early on was a big ball of PHP, written by an overall brilliant person who was unfortunately learning PHP as he was writing it. I spent years trying to avoid dealing with the results of that. At one point I tried building Scala services that talked to MongoDB. I wrote blog posts about this that Etsy employees are still giving me shit about. And with good cause.
I think it’s fair to say that I’m a completely different kind of engineer now. I tend to be focused on things that are only vaguely engineering. I talk at design conferences, or in the “business” track. I care a lot more about product than your average engineer.
I view this less as the result of getting old and cranky and more as the result of climbing up Maslow’s hierarchy of needs. Maslow’s hierarchy, briefly, is the idea that you have to satisfy your more basic needs before higher levels of intellectual fulfillment are possible.
The same is basically true about software. You can’t ask intelligent questions about the direction of the product if you’re worried about which database to use or which alerting system to use. In my career to date, I’ve been pretty lucky to have my most basic needs fulfilled. And I want to help get others to this state.
So, try to think of me as a time traveler from your future. I’ve been through the shiny technology wars you might be fighting today. It’s better over here. The air is fresher. Food tastes better.
So, on to the problem of choosing technology. A thing that I think is obviously true is that as human beings, we have limited attention. We can only worry about so much stuff at one time.
I personally model that like this. You could say that we all get a limited number of innovation tokens to spend. This is a construct I just made up, but I think it’s helpful. And since I created this currency I also decided to put Elon Musk on it. These represent our limited capacity to do something creative, or hard. We really don’t have that many of these to allocate. Early on in a company’s life, we get like maybe three. Not too many more than that.
So what’s your company trying to do? Well, Etsy, where I used to work, is trying to reshape the world economy.
I dunno, that sounds like a big job. That probably requires at least one of your tokens.
The company where I work now is trying to increase the GDP of the internet.
Again, that sounds like a pretty complicated thing to be doing. We probably have to spend at least one of our tokens on that. Maybe two. Maybe all of them!
If you think about innovation as a scarce resource, it starts to make less sense to be on the front lines of innovating on databases. Or on programming paradigms. The point isn’t really that these things can’t work. Of course they can work. But exciting new technology takes a great deal more attention to work than boring, proven technology does.
To get at the reason for that I want to talk about the philosophy of knowledge a little bit. What can we know about a piece of technology? This is not actually a frivolous question. It’s really important.
Now look, I don’t like Donald Rumsfeld. But he’s associated with the following, which is thoroughly relevant to our subject.
And that’s this. When we don’t know something, there are really two different categories that that lack of knowledge can be in. There are known unknowns, that is, things that we know that we don’t know. And there are unknown unknowns, things that we don’t know and that we don’t know that we don’t know.
This applies in technology. This is an example of a known unknown. For a given database, we might not know what happens when a network partition occurs. But we know that a network partition is possible. Since we know that this is possible, we can test for this. Or we can just cross our fingers and hope that it doesn’t happen. Either way, we are informed about the possibility.
There are also unknown unknowns in technology. This is a good example I saw a few months ago. This person had a java process that was writing stats to a file, and that was causing GC pauses. It took him forever to figure this out because the possibility hadn’t occurred to him. That’s an unknown unknown.
Now, it’s important to realize that both categories are present in all software. There are always bugs that nobody knows about, even in software that’s been around forever.
But it’d be wrong to say that all technology is therefore equivalent. New technology has larger magnitudes for both of these sets. New tech typically has more known unknowns, and many more unknown unknowns. And this is really important.
Boring technology in a nutshell is technology that’s well understood. We know what it’s capable of, and at least as importantly, we also know what it’s not capable of. We know how boring technology fails.
So, ok, all you have to do is pick proven technology, and you’re all set, right? Well, no. The combination of things that you choose also matters.
Let’s say that you’re already using this stack. You have python, memcached, mysql, and apache.
Let’s say you have a new problem to solve. Do you think it makes sense to add ruby to your existing stack?
I think most people’s intuition there is “probably not.” We know that the marginal utility of adding ruby isn’t going to outweigh the complexity hit we take by adding it. Python and ruby feel pretty equivalent.
And we’ve had formal proofs since the 1930s that all problems can in principle be solved with one or the other.
Ok, so how about adding redis? We already have mysql and memcached, but should we add redis?
About here is where people lose it and start beating the polyglot programmer drums. There’s something about the idea of adding a new database that has people storming the Bastille, saying “you can’t stop us from using the best tool for the job!” People tend to think that what they're doing when they acquiesce to this is that they're giving developers freedom. And sure, it is freedom, but it's freedom very narrowly defined.
What’s going on there? Let’s try to tease this apart.
This is what we’re implicitly saying when we want to add a piece of technology. Except in relatively rare cases where it’s not possible to solve a problem with our existing stack, we’re saying that the new tech is going to be so much better in the near term that this benefit outweighs the cost of having two pieces of technology around in perpetuity.
We can actually start to formalize this idea, and think about it a structured way.
Well, sort of. I don’t expect to see this published in ACM. But here goes.
Your job is basically what my friend Coda says, here. You’re supposed to be solving business problems with technology.
We can model that as a bipartite graph. On the left side we have business problems, and on the right side we have technical solutions.
As practitioners we have to try to connect all of the nodes on the left side so that our problems are solved. Adding an edge here is making a technology choice.
Every choice has a maintenance cost, but we also get the benefit of the technology that we choose.
Every choice has maintenance costs, but every choice also helps us solve the problem. So we have a nonzero benefit, and a nonzero cost for every choice.
When we add more than one edge, we can make a choice. We can use the same technology that we’ve already paid for …
Or we could pick a different piece of technology. We have to pay for that new tech, too, but maybe we get so much development velocity that it’s worth it.
We can start to think about this mathematically. We’re trying to minimize this cost function. The total cost of our operations is all of the maintenance costs we take on from our choices, minus the development velocity we get from every choice.
The way we behave really depends on what you believe about which term dominates this equation in the real world. If technology is really expensive to operate, the costs dominate. If technology really makes a huge difference in how easy your job is, the benefits dominate.
So, depending, you might decide to make an allocation like this. Here we’ve picked many different technologies to use to solve all of our problems.
And that makes complete sense if each additional technology choice is cheap. If we think that we get more out of using each new technology than we’ll pay for operationalizing it, then doing it this way makes sense.
This is an alternative strategy. Here we’ve chosen just a few technologies,
And that’s what we should do if we think that each technology we add comes with a lot of baggage.
Here in reality, new technology choices come with a great deal of baggage.
This is reality. Costs to operate a technology in perpetuity tend to outstrip the convenience you get by using something different.
So this tends to be the right way to do it. We should generally pick the smallest set of tech that lets us get the job done.
That’s the case because operating a piece of technology at a professional level turns out to be really hard. It’s easy to get started with a lot of technology, but harder to do a really good job with it.
This is why. Adding the technology is easy, living with it is hard. These are all the things you have to worry about.
Polyglot programming is not the kind of freedom we are looking for. If you’re giving individual teams or individual engineers free reign to make local decisions about infrastructure, you’re hurting yourself globally. It’s handing developers the chains so that they’re free to imprison themselves with operational toil, forever.
There’s more to this than just avoiding operational overhead. By embracing polyglot programming, you’re also discarding real benefits that only arise when everyone’s using a shared platform.
A good example of this from my experience is Etsy’s activity feeds. I built this with a small team back in 2010.
Here’s a totally reasonable way to build activity feeds, if that’s all you’re trying to do. You could write events to mysql, aggregate them into a feed offline, stuff the feed into redis, and then serve the feeds to end users from redis. This would totally work great.
But when we set out to build activity feeds, we didn’t have redis. We did have memcached. They’re sort of similar but that have very different guarantees. The most relevant difference to us here is that Redis is persistent, and memcache isn’t. We didn’t add redis to our stack to make activity feeds. We made do with what we had.
And that required a good bit of extra effort up front. Since memcached isn’t persistent, we had to write a bunch of extra code to possibly generate the feed fresh for frontend requests. We couldn’t just assume that the feed would exist when the user came to the site. That was hard work we wouldn’t have had to do if we added redis, but we got through it.
Then we walked away. We didn’t do anything related to activity feeds for years after that.
But a funny thing happened. The usage of activity feeds exploded by 20 times. And it was totally fine. This is the greatest purely technical achievement in my entire career.
The reason it was totally fine was because we used the shared stack. We had to plug in more mysql shards and memcached boxes, but people were doing that anyway.
If we’d done redis just for activity feeds, you can be sure that redis would have become distressed as the feature scaled up 20 times. And we would have had to go back and work on redis just to keep activity feeds working.
Or more likely, someone else would have had to do it. Our team didn’t exist at all a year later, we were all working on different things. Making a mess for others to clean up strikes me as even worse. That’s what you’re doing by adding a piece of technology that makes sense locally.
This is an example, but it’s not an absolute principle. Obviously sometimes it does make sense to add new technology to your stack.
So I wanted to finish by talking about how we should go about doing that.
First of all, it’s important to recognize that adding technology is a process. Technology has global effects on your company, it isn’t something that should be left to individual engineers. I don’t care if you’re a flat organization, a holocracy, or if you have 500 middle managers. You have to figure out how to talk to each other before you add new technology.
When we were all using real hardware, it was usually the case that talking to at least one other person was necessary before adding something new. Now everybody’s on AWS, and this is no longer true. Engineers can sit in a corner and proliferate new systems all day. I don’t think that real hardware is a good thing on balance, but I do think that talking to people is a positive thing. We just have to work harder to do this now, and have those conversations on purpose.
The first question you should talk about is how you’d solve the problem without adding anything new.
I think that you’ll notice that pretty often, this is enough to end the conversation. Because a high percentage of the time, the problem to be solved is that someone wants to use a new piece of tech for its own sake. You should not entertain this impulse as a serious person.
But anyway, assuming that you have a real problem, the answer is rarely that you can’t do it. If you have a functioning website of any kind and you think you can’t accomplish a specific new feature with what you’ve already have, you’re probably just not thinking hard enough. You may need to resort to unnatural acts, but you can get pretty far with a minimal stack.
Again, you might have to do really awkward things, and it’s possible that those are too costly. But you should talk about and write down what those things are.
And if you decide to try out a new piece of technology, you should figure out low-risk ways to get started. Your tactic should not be to rewrite your entire application with it in one step. You should be proving the technology in production with minimal risk, and then gradually gaining confidence in it.
But ultimately, if you’re adding a redundant piece of technology, your goal is to replace something with it. Your goal shouldn’t be to operate two pieces of technology that are redundant with one another forever—commit to replacing what you have, or don’t add the technology.
So, in closing
This is what you should do, most of the time. Choose technology that’s well understood, with failure modes that are known.
Use technology that lets you focus your attention on what really matters.
Don’t choose tech because of testimonials on Hacker News. Hacker News is kind of like Fox News, and not just because it’s dominated by libertarians. Something terrible is happening somewhere in the world all the time, so cable news always has a story. Someone’s porting their site to a NoSQL database right now, and they’ll write an unreasonable blog post about it that will be on HN. It’s unreasonable to extrapolate in either of these scenarios.
Choose a few globally optimal technologies. Don’t make local decisions. Be kind to your future coworkers. Be kind to your future selves.
It’s important to master the tools that you do pick.
Every piece of software has this curve to some degree. When you start out you encounter a bunch of problems, but you expect to get them ironed out over time.
There’s a natural tendency to want to give up on something in its infancy. When you’ve got a lot of problems with a thing, people freak out and want to switch to something else. If you encounter this and you’re naive, it can lead to a lot of wreckage. If you do one project with one database, encounter some of its quirks and then immediately give up, you can pretty rapidly wind up with ten different databases in production.
If you do that you miss out on the part of the curve that we call “mastery.” It’s possible that given enough time with something, you can reach a state of minimal problems. Probably not zero problems, but the situation will feel like it’s stabilized. Now the given curve here, both the magnitude and the shape of it, varies across different kinds of technology. It’s true that you’re probably going to have a better time with mysql than with mongodb. But you’re not going to have zero problems with mysql, and you should not expect that on the path to mastery.
There’s an unfortunate dilemma inherent in mastering your tools: having done that, you know where the bodies are buried. Familiarity with tools can breed contempt.
There are tradeoffs with every tool. You always have things that are good,
and you have things that aren’t great. That’s just the reality of mapping technology solutions onto problems in imperfect ways.
Human nature is to obsess about the pain points. Or at least this is my nature. I think a lot of engineers suffer from the same thing, though, and technology doesn’t help. We don’t usually set up alerts reminding us about how well everything is going, if we all just step back and reflect. Although that’s a good idea for your next hack week.
So it’s also human nature to look at another piece of technology and notice that it solves a couple of those pain points. And this is the definition of naïveté in engineering.
Because as we’ve seen, we might not even think to ask a bunch of questions about a new piece of tech that we should be asking.
There can be a lot of pain points hidden in our own blind spots.
So we recognize that we have all of these cognitive issues: we’re susceptible to the green grass fallacy. We know we will tend to give up on our tools too quickly. We’re all people who got into this business because we like technology, and that will lead us to chase shiny new stuff. Humans are amazing animals that have figured out a method for containing the damage created by our own psychology. It’s called “society.” The way we protect ourselves from our own natures is to have a process. Don’t let technology choices happen without discussion. Have a process.
Real happiness comes from what you can do after conquering technical choices, not from what you get from making technical choices. There’s a tendency among programmers to think that if they’re writing code, by definition they’re not wasting their time. This is a tar pit.
Real happiness comes from achieving your higher-level goals. Not from solving interesting technical riddles that you create for yourself.