How I became an SRE (and what it’s like)

Originally published at https://medium.com/@betz.mark/how-i-became-an-sre-and-what-its-like-8315b6eccccf

In twenty-five years as a professional software person I’ve done quite a few things but they have all focused on, or at least started with, writing code. I am basically a programmer. Somewhere along the way, around the 2000’s if I recall, the term “software engineer” became the fashionable title. I always felt a little silly using it because I don’t have a degree in software anything, and my Dad is an actual engineer with stuff hanging on his walls to prove it. I didn’t go to family parties and talk about what an engineer I was. In fact I’m not sure I ever actually had the word “engineer” in my role until now. In this post I’m going to talk a little bit about how that changed.

Back in 2015 I had just finished up a gig writing a specialty search engine from the ground up, working on a two man team with friend and repeat colleague Joey Espinosa. With just two of us working for a somewhat tech-savvy business person that project was hands-on full-stack everything. We did the data layer, scraping engine, customized spiders for horrible ancient broken sites, web layer, networking, admin, everything. We deployed the app in docker containers using custom scaffolding on AWS instances. It was a ton of fun almost all the time, but business-wise it went nowhere.

Joey left first and not long afterward I followed him to another small startup. I interviewed as a python back-end developer, but shortly after joining the company we both somewhat mystically found ourselves comprising the devops team. Having a separate devops team seemed somewhat antithetical to my understanding at the time of what devops meant (making your devs do ops), nevertheless what we were asked to do — build out a deployment framework for the company’s microservices — sounded fun. Having learned a few lessons on AWS we didn’t want to just throw containers at the cloud, so we pitched a new platform called kubernetes and were given the green light to build out a POC and deploy some services to it.

That was the point at which kubernetes proved it’s worth to me. I was simply blown away by the things you could do with it, even at that early stage. Unfortunately it seemed to intimidate the CTO and we had a lot of trouble winning support to continue with it. In the end it didn’t matter as the company suffered from poor performance and after about six months I found myself looking again. When I starting talking to Olark it was in the context of a back-end development position. Olark has a great hiring process that proceeds from a couple of conversational interviews through a homework project and work-along day. The homework, which I had a lot of fun with, was pretty full-stack: create a chat server and client with some specific features.

I was about halfway through that project (which I was given a week to complete) when I received a call from Mandy Smith, my awesome contact in HR, asking if I would be interested in a role on their engineering operations team instead? A few things on my resume had caught the eye of Brandon Dimcheff, then Olark’s director of engineering: my early docker and kubernetes experience,and maybe the word “devops” as well. We scheduled a call and I spent 45 minutes or so with him and Aaron Wilson, Olark’s first employee and most senior engineer. I admit I went in skeptical. I can recall an earlier conversation I had with a NYC company about another ops-oriented position:

Interviewer: “Your background is all development. Are you sure you want to do ops?”

Me: “Yeah, I’ve asked myself the same question. I’m afraid if I take a job like this I’ll lose my development skills.”

Interviewer: “Yeah…” [sigh] “… that’s what happened to me.”

That pretty much sums up how most programmers, at least of my generation, viewed any job title that would match on ‘^.*ops.*$’. By the end of our call, however, the challenges and goals that Brandon and Aaron described really whetted my appetite and I had agreed to move forward and compete for the engineering operations role. That was almost two and a half years ago and it’s a choice I have never regretted, not even when pagerduty has lit up my phone at 3 AM, something which is thankfully very rare now. For the rest of this post I’ll talk about what it’s like to be an SRE at Olark. Maybe this will be helpful to you if you’re ever considering a similar evolution.

First I should note that “SRE” is not my official title at Olark. We don’t take titles very seriously, so mine is actually “sudo wrestler,” because after I joined I had to make something up. However my role is labelled as “SRE” on internal charts, and if I had to describe it I would probably say “SRE/devops.” At least one of those terms is pretty well-defined. The SRE role originated at Google and was described in a book that is sort of a bible for engineers at Olark and other companies that run Internet services at significant scale. SREs are concerned with system reliability, obviously, and also with performance, observability, maintainability and many other issues.

Devops is not as easily pinned down, and my own thinking on it has evolved over the years. I would say it is a role inhabited by software developers with broad experience who focus on tooling and techniques for streamlining development and deployment. My work at Olark is definitely a combination of all these things. Since joining I’ve played a part in efforts as varied as migrating our entire stack from a legacy cloud provider to Google Cloud, designing and supporting our kubernetes clusters, developing and tuning continuous deployment processes, deploying and debugging networking solutions, and designing and implementing high volume logging across all of our environments.

How does this sort of work compare to programming? What’s good about it? What’s not so good? Far and away the best thing about it is the variety of problems I get to tackle. I’ve always had a deep fascination with computers and how they work, and when I started you had to know a lot about hardware and software just to get a system running. The most satisfying thing about programming, for me at least, is that moment when a thing runs. I’ve never gotten over it, and it’s why I still do this for a living some decades later. At Olark I get to experience that feeling over and over: the thrill of victory when you get a new thing going, or fix an existing thing that has stopped working.

That is something a lot of jobs don’t offer you and I’m more or less addicted to it. I still write plenty of code, but it is more likely these days to be bash scripts, makefiles, and small python or ruby things to glue pieces together. I’m hoping to learn golang at some point, and maybe Elixir. I also spend a lot of time figuring out how to make things work on google’s cloud platform. We run several projects with instances and kubernetes clusters, so there have been plenty of network architecture and software deployment challenges along the way.

Not everything about this role is awesome, and if you’re going to talk about the downsides then the elephant in the room is the pager. Nobody is ever going to love the pager. Here’s the thing: if you do your job right, if you’re allowed to do your job right, then pages are rare, and when they happen it’s for a serious reason that often requires a team response so you’re all in it together. At Olark every back end engineer is in the rotation. It’s a shared burden that we take on because we love working at Olark and feel responsible for the company’s systems. When I first joined it was common to have the phone light up in the middle of the night. As of this writing it’s been over a year since I have had that experience, and I like to think this is because we all share the essential responsibility of keeping our systems up and are all invested in fixing things that cause repeated problems.

The only other thing that I would label a downside is the thing that I miss most about being primarily a programmer: focusing for a long and solitary period of time on hard problems and writing code to solve them. That sort of heads-down intensive work is in my nature. I like it, and I don’t get to do as much of it now. There are more frequent interruptions when part of your role is to be a problem solver, and when you’re writing code it is, as I mentioned above, often coming in smaller chunks to serve very specific purposes. As compensation I get to think about and work on much bigger problems than I did as a purely programming person. And at least some of the desire for turtle time is assuaged by the writing I do here and by side projects I noodle on.

So ultimately the thing I like the best about doing SRE/devops work at Olark is that it requires me to bring to the job literally all of the experience I’ve accumulated in my career. On any given day the problems or goals we take on might involve software, hardware, networking, name resolution, security, debugging, new development, performance analysis, you name it. It all comes up at some point. It keeps the job interesting, and makes it really impossible to fall into a rut because there are no ruts to be seen. Perhaps most important of all it forces you to keep learning new things on almost a daily basis, and that is really the base requirement for survival in this business, whether you have “ops” anywhere in your title or not.

markbetz.net

Completely unknown to millions

How I became an SRE (and what it’s like)

Leave a Reply