2017-06-04

WWDC, Again, 2017 edition

Hey, it's time for another WWDC!  Granted, I haven't been doing iOS development in awhile, and find that I'm not eyeing the latest hardware as much as I used to, but it's still at least a little fun.

For a lot of people, there is a lot of fun and money to be had from the Apple Speculation Game.

For me, I don't care nearly so much.  Historically, I have bought iPhones every 3 years, and iPads nearly as often.  My most recent iPhone is a 6S, so, I probably won't buy a new phone in this coming year.

But here's what I hope for:
-An iPhone 6S,
-Retaining the headphone jack,
-With more storage available,
-Faster would be fine, but isn't required,
-And rather than be thinner,
-Make the front and back flush with the thickest part (the camera),
-Taking up any extra space with battery.

There.  That's it for iPhone.

I love my 6S.  It's height and width are fine.  I like having a Lightning connector as well as a headphone jack.  I use my headphone jack REGULARLY.  I have several different pairs of headphones.  I often swap what pair of headphones is on my head into a different device while sitting at work; I use different headphones for different things.  Likewise, I may swap what headphones are in my device.

Bluetooth does not give me that flexibility.  The cords don't bother me that much.  I will probably buy a nice pair of Bluetooth headphones in the next year or so, but I don't want it to be my only option.

As far as the body being flush on front and back, I hearken back to my iPhone 5 (as well as the 4/4S and 5S phones).  My 6S is the first phone that I felt that I needed a case on it.  I had my 3GS and 5 on my person day-in and day-out for each of their 3-year lives.  While the backs were a little worn, that didn't bother me.  I loathed the idea that I would have to make the device thicker just to protect something that I had no intention of selling, so I didn't need to protect it.  They each suffered a couple falls, but nothing damaging.

My 6S, on the other hand, drove me nuts, because the camera sticks out.  The phone does not lie flat on a table.  So, in order to keep from damaging the camera, or causing stress to the phone when laying it on a table repeatedly, I bought a case.  I got a leather case which I'm quite pleased with, but I hate that I needed it.

I would much rather have a thicker phone with better battery life.

Apple's product lineup from this last year disappointed me.  I didn't care about the iPhone 7, but I wasn't going to get a new iPhone unless my current one broke.  But I wanted to buy a new Mac Laptop, or a cheap laptop, and an iMac.  Typically I would buy the mid-level 15" MBP, but this year I'd intended to buy a pretty much maxed-out 13".  But then Apple "Innovated".  "Bravely."

I don't know anyone who cares about the touch-bar.  I know a lot of people who care about a goddamn Escape Key.  I care about the Escape Key.

I know people who want an Apple Desktop.  I was ready to buy an Apple Desktop.  Guess who didn't buy an Apple Desktop.

In fact, I was so disgusted by Apple's lineup, I went all the way around, and found what is almost my ideal setup:  A Razer Stealth Blade with the Razer Core.  This is an Ultrabook, paired with a Thunderbolt 3 Dock that allows me to run a desktop GPU.  I love this setup.

My two complaints are 1) the trackpad isn't as nice as Apple's, and 2) it's running Windows.  Now, this isn't a knock on Windows, per se; I had administered Windows for a long time.  I can make it work.  But I'd rather be running MacOS, by a long shot.  Aside from gaming, my main application categories are a web browser, terminal, and text editor.  Editors and browsers are fine on Mac, but I want ZSH with oh-my-zsh, my .zshrc, and a Unix toolchain.  Yes, I can install a *nix distro on it, but I'd rather have the MacOS without having to jump through a shitload of hoops.

I'm sorry, but Synaptics is not nearly as good as Apple's touchpad.  I wish it weren't the case.

The one upside is that the number of usable games in my Steam library doubled immediately, and I've been able to play a fair number of games that used to be completely inaccessible to me before.

Too bad Apple couldn't provide something that I could use.  I needed a laptop, but all Apple provided were severely compromised options.

Maybe this year will have better options.





2017-03-16

Dear Vendors

Dear Vendor,

Since I didn’t see an unsubscribe link, I replied with the subject changed to “Unsubscribe”.

That didn’t mean reply some more.

When I replied to that with the subject and body changed to “Unsubscribe”, that was a less-subtle hint.

When you reply to that (maybe including a quote or some byline of mine) not along the lines of “sorry to trouble you”, I read that as “please block and report me as spam."

I know that you're a person with a job to do, but when I express a lack of interest, please take the hint.  Let's not waste both of our time.

2016-12-23

That Product Team Really Brought The Room Together

As with last year, I participated in SysAdvent again, a tradition of Systems Administrators to submit to write a blog article of their choosing.  Similar to a Conference, a Call for Proposals is published, and if interested, you propose a Topic (a title and brief description are all that are required, if I recall correctly.  My proposal was accepted, an editor was assigned, and due dates were set.

I'd had a lot of notes of things that I wanted to say, but I ended up struggling with constructing a coherent narrative, but lots of smallish topics that would be highly overlapping in a Venn diagram, but didn't present linearly in my mind.  Again, being something of a followup of last year's article/talk, there's a lot of related material, but also a lot omitted for time and space reasons.  (I needed to end it _somewhere_...)

Again, I offer my thanks to my editor Cody and the rest of the SysAdvent team for maintaining this tradition, and I sincerely appreciate the work that goes into keeping it going.  While I'd had this bunch of ideas floating around in my head (largely as a conference presentation), it was their work that made me sit down and write it out as prose, rather than a slide deck.

Now, with permission from the author, I present:  



Day 23 - That Product Team Really Brought The Room Together



Written by: H. “Waldo” Grunenwald (@gwaldo)
Edited by: Cody Wilbourn (cody@codywilbourn.com)

There are plenty of articles talking about DevOps and Teamwork and Aligning Authority with Responsibility, but what does that look like in practice?
Having been on many different kinds of teams, and having run a Product Team, I will talk about why I think that Product Teams are the best way to create and run products sustainably.

HEY, DIDN’T YOU START WITH “DEVOPS DOESN’T WORK” LAST TIME?

Yes, (yes I did). And I believe every word of it. I consider Product Teams to be a definitive implementation of “Scaling DevOps” which so many people seem to struggle with when the number of people involved scales beyond a conference room.
To my mind, Product Teams are the best way to ensure that responsibility is aligned with authority, ensuring that the applications that you need are operated sustainably, and minimizes the likelihood that a given application becomes “Legacy”.

What do you mean “Legacy”?

There is a term that we use in this industry, but I don’t think that I’ve ever seen it be well-defined. In my mind, a Legacy Product is:
  1. Uncared For: Not under active development. Any releases are rare, using old patterns, and are often the result of a security update breaking functionality, causing a fire-drill of fixing dependencies.
  2. In an Orphanage: The people who are responsible for it don’t feel that they own it, but are stuck with it.
If there is a team that actively manages a legacy product, they might not be really equipped to make significant changes. Most of the time they are tasked only with keeping this product barely running, and may have a portfolio of other products in similar state. This “Legacy Team” might have some connotation associated with it of being “second-string” engineers, and it might be a dumping ground for many apps that aren’t currently in active development.

What are we coming from?

The assumed situation is there is a product or service that is defined by “business needs”.
A decision is come to that these goals are worthwhile, and a Project is defined.
This may be a new product or service, or it may be features to an existing product or service. At some point this Project goes into “Production”, where it is hopefully consumed by users, and hopefully it provides value.

Here’s where things get tricky.
In most companies, the team that writes the product is not the same team that runs the product. This is because many companies organize themselves into departments. Those departments often have technical distinctions like “Development” or “Engineering”, and “Quality Assurance”, and an “Operations” and/or “Systems” groups. In these companies, people are aligned along job function, but each group is responsible for a phase of a product’s lifecycle.
And this is exactly where the heart of the problem is:
The first people who respond to a failure of the application aren’t the application’s developers, creating a business inefficiency:
Your feedback loop is broken.

As a special bonus, some companies organize their development into a so-called “Studio Model”, where a “studio” of developers work on one project. When they are done with that project, it gets handed off to a separate team for operation, and another team will handle “maintenance” development work. That original Studio team may never touch that original codebase again! If you have ever had to maintain or operate someone else’s software, you might well imagine the incentives that this drives, like assumptions that everything is available, and latency is always low!
See, the Studio Model is patterned after Movie and Video Game Studios. This can work well if you are releasing a product that doesn’t have an operational component. Studios make a lot of sense if you’re releasing a film. Some applications like single-player Games, and Mobile Apps that don’t rely on Services are great examples of this.
If your product does have an operational component, this is great for the people on the original Studio team, for whom work is an evergreen pasture. Unfortunately it makes things more painful for everyone who has to deal with the aftermath, including the customers. In reality it’s a really efficient way of turning out Legacy code.
Let’s face it, your CEO doesn’t care that you wrote code real good. They care that the features and products work well, and are available so that they bring in money. They want an investment that pays off.
Having Projects isn’t a problem. But funding teams based on Projects is problematic. You should organize around Products.

OK, I’LL BITE. WHAT’S A PRODUCT TEAM?

Simply put, a Product Team is a team that is organized around a business problem. The Product Team is comprised of people such that it is largely Self-Contained, and collectively the team Owns it’s own Products. It is “long-lived”, as the intention behind it is that the team is left intact as long as the product is in service.
Individuals on the team will have “Specialties”, but “that’s not my job” doesn’t exist. The QA Engineer specializes in determining ways of assuring that software does what’s expected to. They are not responsible for the writing of useful test cases, but they are not limited to the writing of tests. Notably, they’re not solely responsible for the writing of tests. Likewise for Operations Engineers, who have specialties in operating software, infrastructure automation, and monitoring, but they aren’t limited to or solely responsible for those components. Likewise for Software Engineers…
But the Product Team doesn’t only include so-called “members of technical staff”. The Product Team may also need other expertise! Design might be an easy assumption, but perhaps you should have a team member from Marketing, or Payments Receivable, or anyone who has domain expertise in the product!
It’s not a matter of that lofty goal of “Everyone can do everything.” Even on Silo teams, this never works. This is “Everyone knows enough to figure anything out“, and ”Everyone feels enough ownership to be able to make changes."
The people on this team are on THIS team. Having or being an engineer on multiple teams is painful and will cause problems.

You mentioned “Aligning Authority with Responsibility” before…

By having the team be closely-knit, and long-lived, certain understandings need to be had. What I mean is that if you want to have a successful product, and a sustainable lifecycle, there are some understandings that need to take place with regards to the staffing:
  • Engineers have a one-to-one relationship to a Product Team.
  • Products have a one-to-one relationship with a Product Team.
  • A Product Team may have a one-to-many relationship with it’s Products.
  • A Product Team will have a one-to-one relationship with a Pager Rotation.
  • An Engineer will have a one-to-one membership with it’s Pager Rotation.
Simply put, having people split among many different teams sounds great in theory, but it never works out well for the individuals. The teams never seem to get the attention required from the Individual Contributors, and an Individual Contributor is in a position of effectively doubling their number of bosses having to appease them all.

Pager

Some developers might balk at being made to participate in the operation of the product that they’re building. This is a natural reaction.
They’ve never had to do that before. Yes, exactly.
That doesn’t mean that they shouldn’t have to. That is the “we’ve always done it this way” argument.

This topic has already been well-covered in another article in this year’s SysAdvent, in Alice Goldfuss’ “No More On-Call Martyrs”, itself well-followed up by @DBSmasher’s “On Being On-Call”.
In this regard, I say is that if one’s sleep is on the line - if you are on the hook for the pager - you will take much more care in your assumptions when building a product, than if that is someone else’s problem.
The last thing that amazes me is that this is a pattern that is well-documented in many of the so-called “Unicorn Companies”, who’s practices many companies seek to emulate, but somehow “Developers-on-Call” always is argued to be “A Bridge Too Far”.
I would argue that this is one of their keystones.

WHO’S IN CHARGE

Before I talk about anything else, I have to make one thing perfectly clear. If you have a role in Functional Leadership (Engineering Manager, Operations Director, etc), your role will probably change.
In Product Teams, the Product Owner decides work to be done and priorities.
Within the team you have the skills that you need to create and run it, delegating functions that you don’t possess to other Product Teams. (DBA’s being somewhat rare, and “DB-as-a-Service” is somewhat common.)
Many Engineering and Operations managers were promoted because they were good at Engineering or Ops. Unfortunately it’s then that it sets in that, in Lindsay Holmwood’s words, “It’s not a promotion, it’s a career change”, and also addressed in this year’s SysAdvent article “Trained Engineers - Overnight Managers (or ‘The Art of Not Destroying Your Company’)” by Nir Cohen.
How many of you miss Engineering, but spend all of your time doing… stuff?
Under an org that leverages Product Teams, Functional Leaders have a fundamentally different role than they did before.

Leadership Roles

Under Product Team paradigm, Product Managers are responsible for the work, while Functional Managers are responsible for passing of knowledge, and overseeing the career growth of Individual Contributors.
Product ManagersFunctional Managers
Owns ProductIC’s Professional Development
Product DirectionCoordinate Knowledge
Assign Work & PriorityKeeper of Culture
Hire & Fire from TeamInvolved in Community
Decide Team StandardsBullshit Detector / Voice of Reason

Product Managers

The Product Manager “Owns the Product”. They are ultimately responsible for the product successfully meeting business needs. Everything else is in support of that. I must stress that it isn’t necessary that a Product Manager be technical, though it does seem to help.
The product owner is the person who understands the business goals that knowledge and those stakes, they assign work and priorities such that it’s aligned with those business goals.
Knowing the specific problems that they’re solving and the makeup of their team, they are responsible for hiring and firing from the team.
Because the Product Team is responsible for their own success, and availability (by which I mean, of course, the Pager), they get to make decisions locally. They get to decide themselves what technologies they want to use and suffer.
Finally, the Product Manager evangalizes their product for other teams to leverage, and helps to on-board them as customers.

Functional Managers

At this point, I expect that the Functional managers are wondering “well what do I do?” Functional Managers aren’t dictating what work is done anymore, but there is still a lot of value that they bring. Their job becomes The People.
I don’t know a single functional manager who has been able to attend to their people’s professional development like they feel that they should.
Since technology decisions are made within the Product Team, the Functional Management has a key role in coordinating knowledge between the members of their Community, keeping track of who’s-using-what, and the relevant successes and pitfalls. When one team is considering a new tool that another is using, or a team is struggling with a tech, the functional manager is well-equipped for connecting people.
Functional Managers are the Keepers of Culture, and are encouraged to be involved in Community. That community-building is both within the company and in their physical region.
Functional managers are crucial for Hiring into the company, and helping Product Managers with hiring skills that they aren’t strong with. For instance, I would run a developer candidate by a development manager for a sanity-check, but for a DBA, I’d be very reliant on a DBA Manager’s expertise and opinion!
Relatedly, the Functional Manager serves as a combination Bullshit Detector and Voice-of-Reason when there are misunderstandings between the Product Owners and their Engineers.

The Reason for Broad Standards

Broad standards are often argued for one of two main reasons: either for “hiring purposes”, where engineers may be swapped relatively interchangably, or because there is a single Ops team responsible for many products, who doesn’t have ability to cope with multiple ways of doing things. (Since any one Engineer might be called upon to solve many apps in the dark of the night.)
Unfortunately, app development can often be hampered by those Standards that don’t fit their case and needs.
Hahahaha I’m kidding! What really happens is that Dev teams clam up about what they’re doing. They subvert the “standards” and don’t tell anyone, either pleading ignorance or claiming that they can’t go back and rewrite because of a deadline. Best case is that they run a request for an “exemption” up the flagpole, where Ops gets Over-riden. And Operations is still left with a “standard” and pile of “one-offs”.

Duplicate Effort

Another claimed reason for broad “Standards” is to “reduce the amount of duplicated effort”. While this is a great goal, again, it tends to cause more friction than is necessary.
The problem is the fallacy that comes from assuming that the way that a problem was solved for one team will be helpful to another. That solution may be helpful, but to assume that it will, and making it mandatory is going to cause unnecessary effort.
At one company, my team ran ELK as a product for other teams to consume. A new team was spun up, and asked about our offerings, but asked my opinion of them using a different service (an externally-hosted ELK-as-a-Service). I was thrilled, in fact! I want to see if we were solving the problem in the best way, or even a good way, and to be able to come back later for some lessons-learned!

Scaling Teams

At some point, your product is going to get bigger than everyone can keep in their head. It may be time to split up responsibilities into a new team. But where to draw boundaries? Interrogate them!
A trick that I learned a long time ago for testing your design in Object-Oriented Programming is to ask the object a question: “What are you?” or “What do you do?” If the answer includes an “And”, you have two things. This works well for evaluating both Class and Method design. (I think that this tidbit was from Sandi Metz’s “Practical Object-Oriented Design in Ruby” (aka “POODR”), which I was exposed to by Mark Menard of Enable Labs.)

What Doesn’t Work

Because this can be a change to how teams work, it’s important to be clear about the rules. If there is a misunderstanding about where work comes from, or who the individual contributors work for, or who decides the people who belong to what team, this begins to fall apart.
Having people work for multiple sets of managers is untenable.
Having people quit is an unavoidable problem in any company. Having a functional manager decide by themselves that they’re going to reassign one of your people away from you is worse, because they’re not playing by the rules.

WARNING: Matrix Organizations Considered Harmful

If someone proposes a Matrix Org, you need to be extremely careful. It’s important that you keep a separation of Church and State. Matrix Organizations instantly create a conflict between the different axes of managers, with the tension being centered on the individual contributor who just wants to do good work. A Matrix Org actively adds politics.
All Work comes from Product Management. Functional Management is for Individual Careers and Sharing Knowledge.
This shouldn’t be hard to remember, as the Functional Leaders shouldn’t have work to assign. But it will be hard, because they’ll probably have a lot of muscle-memory around prioritizing and assigning work.
Now, I’m sure a lot of you are skeptical about how a product team actually works. You might just not believe me.
If you properly staff a team, give them direction, authority, and responsibility, they will amaze you.

GETTING STARTED

As with anything, the hardest thing to do is begin.

Identifying Products

An easy candidate is a new intiative for development that may be coming down the pipeline, but if you aren’t aware of any new products, you probably have many “orphaned” products already running within your environment.
As I discussed last year, there are plenty of ways of finding products that are critical, but not actually maintained by anyone. Common places to look are tools arounddevelopment, like CI, SCM, and Wikis. Also commonly neglected are what I like to call “Insight Tools” like Logging, Metrics, and Monitoring/Alerting. These all tend to be installed and treated as appliances, not receiving any maintenance or attention unless something breaks. Sadly, it means that there’s a lot of value left on the table with these products!

Speaking with Leadership

If you say “I want to start doing Product Team”, they’re going to think of something along the lines of BizDev. A subtle but important difference is to say that you want to organize a cross-functional team, that is dedicated to the creation and long-term operation of the Product.
I don’t know why, but it seems that executive go gooey when they hear the phrase “cross-functional team”. So, go buzz-word away. While you’re at it, try to initiate some Thought Leadership and coin a term with them like “Product-Oriented Development”! (No, of course it doesn’t mean anything…)
What you’re looking for is a commitment to fund the product long-term. The idea is that your team will solve problems centered around a set of problems. The team is of “Your People”, that becomes a “we”. Oddly enough, when you have a team focused and aligned together, you have really built a capital-T “Team”.

SUSTAINED

The Product Team should be intact and in-development as long as the product is found to be necessary. When the product is retired, they product team may be disbanded, but nobody should be left with the check. Over time, the features should stabilize, and the bugs will disappear, and the operation of the application should stabilize to a low level of effort, even including external updates.
That doesn’t mean that your engineers need to be farmed out to other teams; you should take on new work, and begin development of new products that aid in your space!

CONCLUSION

I believe that organizing work in Product Teams is one of the best ways to run a responsible engineering organization. By orienting your work around the Product, you are aligning your people to business needs, and the individuals will have a better understanding of the value of their work. By keeping the team size small, they know how the parts work and fit. By everyone operating the product, they feel a sense of ownership, and by being responsible for the product’s availability, they’re much more likely to build resilient and fault-tolerant applications!
It is for these reasons and more, that I consider Product Teams to be the definitive DevOps implementation.

GRATITUDE

I’d like to thank my friends for listening to me rant, and my editor Cody Wilbourn for their help bringing this article together. I’d also like to thank the SysAdvent team for putting in the effort that keeps this fun tradition going.

CONTACT ME

If you wish to discuss with me further, please feel free to reach out to me. I am gwaldo on Twitter and Gmail/Hangouts and Steam, and seldom refuse hugs (or offers of beverage and company) at conferences. Death Threats and unpleasantness beyond the realm of constructive Criticism may be sent to:


Waldo  
c/o FBI Headquarters  
935 Pennsylvania Avenue, NW  
Washington, D.C.  
20535-0001

2016-11-01

On The DevOps Drinking Game

Just for laughs, as part of my "Fear and Loathing in Systems Administration" conference talk, I saw a real gap in our community's resources.  My research turned up nothing, so I took it upon myself to "be the change that I want to see in the world.

Thus was born a new GitHub project: The DevOps Drinking Game.

Enjoy, and Be Safe!

-Waldo

2016-08-29

On the Loss of a Star

Someone that I don't know died today. Apparently many of my friends have close personal relationships with famous actors and musicians. Apparently they don't invite me along when they hang out with their celebrity-friends. I can't blame them.
Fortunately for me, the works that I knew these celebrities from has been recorded, so that I may continue to enjoy their works as I always have.
For those of you who had a relationship with the deceased, I am sorry for your loss.

2015-12-31

Fear and Loathing in Systems Administration

This year I participated in SysAdvent, a tradition of Systems Administrators to submit to write a blog article of their choosing.  Similar to a Conference, a Call for Proposals is published, and if interested, you propose a Topic (a title and brief description are all that are required, if I recall correctly.  My proposal was accepted, an editor was assigned, and due dates were set.

After fleshing out some further notes and constructing an outline, I set out to procrastinate until a couple of days before my due-to-editor deadline.  My editor, the fantastic Shaun Mouton (@sdmouton), promptly reviewed my content for style, sense, and sanity, and when we came to consensus that it was good enough, he submitted it for publication.

I offer my thanks to Shaun and the rest of the SysAdvent team for maintaining this tradition, and I sincerely appreciate the work that goes into keeping it going.  While I'd had this bunch of ideas floating around in my head (largely as a conference presentation), it was their work that made me sit down and write it out as prose, rather than a slide deck.

Now, with permission from the author (), I present:  


Fear and Loathing in Systems Administration

Written by H. “Waldo” Grunenwald (@gwaldo)
Edited by Shaun Mouton (@sdmouton)

“DEVOPS DOESN’T WORK”

The number of times that I’ve heard this is amazing. The best thing about this phrase is that the people who say it are often completely right, even if for very wrong reasons.

Who Says This?

Well, let’s talk about the people who most commonly have this reaction: SysAdmins. I’m going to use the term “SysAdmins” as a shorthand for a broad group. The people in this group have widely varying titles, but it is most commonly “Systems”, “Network”, or “Operations” follwed by “Administrator”, “Engineer”, “Technician”, or “Analyst”.

In some companies, these folks have the best computers in the place. In others, they have to live with the worst. Their workspace probably isn’t very nice, and almost certainly has no natural light. If there is a pager rotation, they are almost certainly on it. If there isn’t a rotation, they’re basically on-call all of the time.

During the course of a normal day they might have to switch contexts between disaster planning, calculating power and HVAC needs for a new datacenter, scrambling to complete an outage-driven SAN migration, rushing to address urgent requests to help people with their email, to troubleshoot a printing problem, or suss out why someone can’t get to their electric company’s bill pay website. They may be the sole person with database expertise in the company, or they may work on a team of dozens.

The work is largely invisible except when something fails, in which case it’s highly visible and widely impacting.

Bug vs Incident

These are typically cynical people, because there are only so many times that you can’t make the team/department/company party for ostensibly “celebrating our successes” because something’s broken, and you’re left to clean up after the “success”. There are only so many times that one sees a new project announced and begins to hire more people. When asked who’s going to support the new project, the response is a blank look and “you are”. The “…of course.” may not be vocalized, but it’s probably there. When asked how many people they get to hire to help with the workload, the response is a combination of “sorry, but there wasn’t anything left in the budget”, “it won’t be that much more work”, or a variation of the “team player / good soldier” speech. There are only so many times one can take getting your requests for training or conference budget rejected out of hand, and have your requests for training or conference budget laughed out of the room.

They probably have basic working knowledge of a half-dozen programming languages, but most likely they often think in Shell. They probably know at least three ways of testing if a port is open, and probably have a soft spot in their heart for a couple of shell commands.

They may have seen or participated in a DevOps initiative that consisted of a team or position rename, or helpfully suggested that they install some Config Management and Monitoring software so that “we can DevOps now…” or “so we can do Agile”. When they hear “DevOps” or “Agile”, what they are hearing is is Let’s take the same people who can’t handle a planned release schedule or make whatever effort that they need to squeak by the Change Board and Release Management requirements, and give them unfettered access to Production. Clearly, I’m not paged often enough.

So what is one to do? How is one to maintain their sanity in the face of increasing job scope, increasing demand for access and velocity, and little hope for an effective new-hire count? Not to mention continuing to juggle the existing volume of requests, and continuing to grease the existing gears to keep the machine running.

GET HELP

Get Help

Please note that I’m not saying “just”. There’s nothing just about this situation; there is nothing simple about any of this, and Justice hasn’t been seen in a long time in an environment where this is the norm. Most of these changes are difficult. They will take work, and will require convincing other teams to join in your cause.

Admitting you have a Problem

The problem (probably) isn’t technical. It’s almost entirely social.

Because SysAdmins are typically responsible for the environment, the easiest way to assure that the state is stable is to lock everyone else from it. While this helped with the goal of “keeping out unexpected changes”, it had a number of side effects.

First, a kind of learned helplessness has set in. Your customers and teammates became so used to being “hands-off”, that they don’t have the wherewithal to meet reasonable expectations. Since they’re uncomfortable making any changes, all changes must be made by the SysAdmins. This leads to your time being taken by having to perform lots of low-value tasks.

Some teams settle on the pattern of “hands off Production, but you have access to Staging”, but this is fraught with peril. The most common problem that stems from this is “Configuration Drift.” Config Drift is when you have different settings in one environment (or server) than the others. When the cost to discover what Production looks like is high, it’s more likely that people will either use defaults, make assumptions, or use the same configs that they use in their IDEs. “Works on my machine”, indeed.

This is a problem well-solved by Configuration Management tools, but you still need to be willing to trust your peers and give them access. If you want to be part of the process of validating changes, you could put in place the structures that allow a pull-request and code-review workflow, something that your Software Engineering peers should be very accustomed to! Granting access to see the existing configs and the ability to propose changes also shares responsibility for your team’s environments and contributes to feelings of ownership. Denying colleagues the ability to effect necessary configuration changes contributes to the root problems of configuration drift and learned helplessness.

Stop Feeding the Machine

Don't feed the machineYour value is not in doing the work, but rather being able to make the decision to do the work.
I’ll be the first to say that “Automating ALL THE THINGS” is a flawed goal. At work, it’s usually said in the context of a Project, rather than part of a philosophy of Continual Improvement (Think Toyota Kata). You shouldn’t have to engage in an “Automation Project” to improve your environment. Build into your schedule time to solve one problem. Pick something that is rough, manual, and repeatable. Remove a small piece of friction. Move on to the next one. Hint: Logging into a server to make a configuration change should be a cue to implement configuration management!

While I agree that everything being automated, not everything should be automatic. Decision-making is complex, and attempting to codify all of the possible decision-making points is a fantastic way to make yourself insane. Not to mention that documenting your decision-making processes may be an unwanted look inside your brain. Caveat Implementor. (Or perhaps that’s just me…)

All of the units of work should be automated. But the decision to run the now-automated tasks can be left to a human. When you find that there is a correlation between steps, those pieces should be wrapped together. Automation isn’t a project into itself. It should be iterative. Pick something that’s painful. Make it a little smoother. Repeat. Ideally, you have time blocked out for Continuous Improvement. If not, create a meeting, or create a weekly project to do so. Review the issues that you’ve experienced lately, and pick something to make better. It might be worth making into a project, but it won’t be an ALL-THE-THINGS project. Create a scope of effort. Take the time to plan goals and requirements.

Whatever you don’t automate must be documented. Beyond the typical benefits of documentation, it also serves as “Functional Requirements” for someone else to pick up when they can help you with providing a solution. Try to recognize whether documenting or automating takes longer. Perhaps this piece of documentation will bet better served by “Executable Documentation” (i.e. code).

Clarify Your Role

Role-Playing Group

You should attempt to pick apart the parts of your work, and attempt to describe them. One way to make this a fun exercise is to use other job titles to describe the work.

Are you an “Internet Plumber”? How much of your job could be described as “Spelunking” into the deep dark caverns of Legacy systems?

If you want, you could ascribe Superhero names to these parts of your work. The added bonus is that it not only describes a role, but also a demeanor associated with them. When ‘bad code’ makes it to Production, do you go “Wolverine” on that dev team?

Could you describe part of your role as “Production Customs Official”? Are you the gateway to Production? If so, are you actually equipped to do that? Here’s a quick test: When you say “no, that can’t go”, do you get overridden?

More importantly, is this what you want to do?

Prepwork

You will need to prepare for this. Most SysAdmin teams do not have a healthy relationship with the rest of the business. You will need to initiate the healing.

Take someone to lunch. Preferably someone who you don’t know well. Ask questions, and listen to the answers. It is not time to defend yourself or your team. It’s time to find out what the business needs from someone else’s perspective. Ask what they think that your team’s role is in toward achieving that success. Ask what they think your team does well, and where there are gaps between what you have now and excellence.

Speak their language

Rosetta Stone

You probably recognize their words, but you need to go out of your way to speak them. To communicate your message, you will need meet them on their turf. This may seem terribly unfair - “Why can’t they meet me on my terms?!” - but I’m guessing that has not been working out well for you so far.

Not only do you need to use their language, but you need to communicate over their medium. And identifying who they are is step one in learning to speak it. It’s probably not IRC, and only writing it in email is a good way for it to be ignored.

If you’re speaking to management, be prepared to write a presentation. Executives especially like to see a slide-deck. It doesn’t have to be slick. It probably shouldn’t have sounds or much in the way of transitions, but a presentation can help to lay the groundwork for a conversation.

Discuss Scope, Staffing, and Priorities

Gantt Chart

Now that you have described your role, we also need to describe everything that you support.
What Products do you support? It’s entirely possible (likely, even) that the people and teams that you support don’t actually know what you’re responsible for. It could be argued that most of them shouldn’t need to know. But if you have been saying “no” to protect yourself, it’s a sign that you are significantly overextended. You need to have a real discussion with your leadership about your role, scope, and staffing.

In order to have this discussion, you need to prepare. You need to come up with a fairly comprehensive list of the products and teams that you support. This is a list of every team, and their products, the components and tasks that belong to you for each. Don’t forget all of the components that “nobody owns” but somehow people come to you to fix or implement (CI, SCM, Ticketing, Project, and Wiki tools seem to be common examples). Are you also responsible for Directory Services? Virtualization platform? Mail/Chat/Phones? Workstation Purchasing and provisioning? Printers? Do you manage the Storage, Networking, etc? Don’t be afraid of getting into details. It can help to provide clearly written potential impacts the company if some of these “hidden” services stop working? Your leadership might not know what LDAP or Directory Services are, but they’ll understand if nobody can log into their machines, they can’t pull information to build reports, and by-the-way nobody can deploy code because it relies on validating credentials…

What is most important to the company? What do you need to succeed? How much more staff do you need? What tooling or equipment would help you work more efficiently? Does code deploy even when it fails testing? How many outages have arisen due to this happening?

Demonstrate Cost and Value and Revisit Priorities


faux ink stamp "Priority"In order to have meaningful discussions with people in your company who aren’t necessarily technical, you need to be able to relate to a language that they speak. Regardless of team duties, the lingua franca of most teams is money. As Engineers, most of us prefer to think in terms of the tech itself, but in order to describe an impact, a unit of monetary value is a proxy for impact that most non-technical people can understand, even if they don’t grasp the details.

It is a helpful (if difficult and uncomfortable) habit to get into, but I encourage you to consider the components of cost that goes into every incident or task.

What is the cost of a main-site outage? How much revenue does this feature bring in? Why are you spending so much on infrastructure and effort to make that component Highly-Available? Why does it matter that you do that piece of maintenance? Show the negative value of doing things they way they are (Opportunity Cost), versus investing time to improve the automation around it. Describe how doing this maintenance work reduces your context switching, unplanned outages, and lost reputation of your company. Describe the benefit in increased visibility to the business, and Agency to be gained by your peers on other teams.

Why put in place these tools to let product teams self-serve? Describe that the features that the company’s teams spend so much time and effort (read: “money”) creating means nothing if those features aren’t available for customers to use. That having those features not available costs money in terms of feature billing, and reputation cost. If they claim that they’re doing Agile, but can’t do Continuous Delivery, they’re not really Agile, and the whole point of that framework is to improve delivery of value to the customer and the business!

Further, show how systems relate. It doesn’t have to be terribly detailed. Describe that the features that the customers use are reliant on xy, and z components of infrastructure. Draw the lines from LDAP to storage to your CI tool to testing code to artifacts delivered to Production. Then show some of the other systems that have similar dependencies.

Once the picture emerges showing how everything is reliant on unexciting things like LDAP, your Storage cluster, and that janky collection of angry shell and perl scripts that keep everything working, realization will begin to dawn.

Congratulations, you’ve just effectively communicated Value.

Align Responsibility with Authority

Are you held responsible for apps written by other people? Who gets paged when “the app” goes down? How does that make sense?

Get Devs on-call for their apps. SysAdmins should be escalated to. Devs can triage and troubleshoot their own apps more readily than you can. They get to call in the cavalry when they get stuck. They don’t need to know everything about the systems, and they don’t need to resolve everything. When a fault occurs and they need help, they stay on the call, pairing with you as you diagnose, troubleshoot, and resolve. That way, they don’t need to escalate to you for that thing the next time it occurs, and can collaborate on automating a permanent fix.

When teams aren’t responsible for their products - When they aren’t paged when it fails - they are numb to the pain that they inflict. They’re not trying to cause pain; they just don’t feel it. It’s especially easy to argue this for teams that proclaim that they use Agile development methods: If they claim to want “continuous feedback”, there is nothing more visceral for providing feedback than the feeling of being awoken by a pager in the middle of the night. When the inevitable exclamation comes that “we can’t interrupt our developers”, ask if it makes sense to interrupt someone else.
Even being aware of the pain (say, hearing how many times you were paged last night) can elicit sympathy, but that’s a far cry from the experience of being paged yourself.

Further, this is what that list of responsibilities is for. Asking each team to take responsibly for their own products, you will still likely have a hefty list of services that you provide that you are on-call for. As these set in, point out the staffing numbers. This may be a matter of the places that I have worked, but I have never seen a Developer-to-SysAdmin ratio of less than 5-1. In most places it is much higher. By adding these teams to pager rotations, they drastically reduce the load on you. By not adding them to pager rotations, they are complicit in your burnout.

Stop saying “No”


No No'sSysAdmins have a reputation for saying “No”. The people who are asking are probably not trying to make your life worse; They’re probably just trying to get their work done. They might not know what their “simple request” involves, and that it probably isn’t necessary.

But by not having Responsibility aligned with Authority, you may have been stuck with the pain of other people’s wishes. You know that fulfilling their request will cause you pain, so understandably, you say “no”. What often happens next is that they escalate until they hit someone sufficiently important enough to override you.

This is the basis for why SysAdmins feel steamrolled by everyone else, and everyone else feels held hostage by SysAdmins.

But all hope is not lost.

Stop saying “No”.

“Yes, but …” is a very powerful thing.

“Yes, but …” can be used to get you help.

“Yes, I can set that up for you, but we don’t have capacity to run it for you.” What happened there? You agreed that the request is reasonable. You set expectations of the level of support that you can give. You left the requestor with several options to continue the conversation.
  • They might have hiring reqs that they can’t fill. You can negotiate for some of them to go to your team, as you’re clearly understaffed.
  • Some of their engineers may join your team as a lateral move. They’ll need mentorship and training, but this kind of cross-training is invaluable. It’s a force multiplier. It also sets precedent.
  • They might take the responsibility for the Thing. They run it. They get paged for it. Of course you will probably have to be an escalation point to assist when it fails, but it’s their product. This again sets precedent.

Delegate

Most SysAdmins are stuck doing tasks that provide very little value because they restrict access to their peers. To my mind, there is one perfect example: “Playing Telephone”.

When I say “Playing Telephone”, I’m talking about the situation where someone (let’s say a Developer) wants logs from the application, but they don’t have access to get them. They request the logs from you. You fetch the log requested and provide it to them. “No, not that log, this log…” You fetch. “Hmm, I’m not seeing what I’m looking for, could you check in here for something that says something like this …?” And so on, and so on…

I don’t know what you’re hoping to prevent by restricting access, but if this scenario ever happens, you should know that you’re providing Negative Value. Again, let’s try to remember that your peers are not out to get you, and can probably be trusted to be reasonable humans if you meet them mid-way.

With that framework in mind, it’s time that you demonstrate some trust, and Delegate to them. Give them access. Your value is not in the logon credentials that you have, otherwise you’re just a poorly-implemented “Terminal-as-a-Service”.

Even better than giving access, is giving Tooling. Logging into a server should be an antipattern for most work! You need some better tooling. So, with the example of logging, let’s talk tooling.

Logging

First, logging into boxes to get logs is just dumb. Sure, you could wrap a tail command in a Rundeck job, but let’s Centralize those logs while we’re at it.

SysLog is better than nothing, but not by much. Shipping logs is easy, but consuming them as something useful is not. Batteries not included.

If your company wants to spend the money on Splunk, then encourage that. Splunk is a fantastic suite of tools, but I might wave you away from it if you’re not going to use it for everything. It’s going to be expensive, and if you’re not going to spend enough to use it for everything, there will be confusion as to what’s in there, and what’s stored elsewhere.

ELK (ElasticSearch + Logstash + Kibana, sometimes mistakenly simplified to “Logstash”), or a “Cloudy Elk” / “ELK-as-a-Service” is a good middle-ground. ELK is Free (as-in-beer), and very featureful.

Take your Centralized logging of choice, and provide your customers with the url to the web interface. Send them links to the “How to use” docs, and get out of their way!

Terminal-as-a-Service

Put a Bird on itIf someone asks you to “run this command for me”, you need to put a button on it.

You don’t need to RUN-ALL-THE-THINGS!

Rundeck is a fantastic tool to “Put a button on it”. Other people use their CI tools (like Jenkins or Bamboo) for this. My friend Jeremy Price gave an Ignite Talk at DevOpsDays NYC 2015 that describes this.

Personally I like Rundeck, because it’s pretty easy to make HA, tie it into LDAP for credentials, manage permissions, and by shipping it’s logs (see what I did there?), you get Auditing of who ran what and when!

If you have some data that Must be restricted, try to isolate those cases from the rest of your environment. You shouldn’t have to restrict Everything just because Something does need isolation.

Deploying Code. Yes, to Production

Why would you want to have to deploy other people’s code?! Do you really provide any value in that activity? If the deployment doesn’t go well, you’re launching another game of “Telephone”.
What if you make it easy for them to do it? Empower them with trust and tooling, making it easy to do the right thing! Give them tooling to see that the deploy succeeded! Logs are a start, but Metrics Dashboards that show changes in performance conditions and error rates will make it plain to see if a deployment was successful!

This Freedom doesn’t come free. Providing tooling doesn’t absolve the development teams of the need to communicate; in fact, it’s likely that they’ll have to communicate more. They will need to be watching those dashboards and logs to see for themselves the success of every deploy. They will also be more readily on-hand to help triage the inevitable instances when it doesn’t go swimmingly.

US

I say “They” in this article a lot. And that is because, by default, most organizations that I have been a part of or heard stories of have had a strong component of “Us-Versus-Them.” It’s only natural for there to be an “Us” and a “Them”, but thinking in those terms should be a very short-term use of the language. Strive for the goal of a “We” in your interactions at work, and reinforce that language wherever possible. While it may not be My job to do “foo”, it is Our job to ensure the team and company is successful.

While that may sounds like some happy-go-lucky, tree-hugging, pop-psychological nonsense (and it is…:), the goal here is to get you, the beleaguered SysAdmin the help that you need, in order to improve the capabilities of the business.

CODA

There is so much more to this topic, particularly the shift away from a Systems team supporting a bunch of Project teams to a series of largely self-sustaining Product teams, but that will have to wait for another day.

The psychological damage done to SysAdmins by their peers can make us bitter and cynical. I encourage my people to try to see that “They” aren’t trying to make life difficult for you, but it’s very likely that Authority and Responsibility are misaligned. I likewise encourage my people to take steps to make their lives better. A ship’s course is changed in small degrees over time.

When someone says “DevOps Doesn’t Work”, they’re absolutely correct. DevOps is a concept, a philosophy, a professional movement based in trust and collaboration among teams, to align them to business needs. A concept doesn’t do work, and a philosophy does not meet goals - people do. I encourage you to seek out ways of working better with your fellow people.

GRATITUDE

I’d like to thank my friends for listening to me rant, and my editor Shaun Mouton for their help bringing this article together. I’d also like to thank the SysAdvent team for putting in the effort that keeps this fun tradition going.

CONTACT ME

If you wish to discuss with me further, please feel free to reach out to me. I am gwaldo on Twitter and Gmail/Hangouts, and seldom refuse hugs (or offers of beverage and company) at conferences. Death Threats and unpleasantness beyond the realm of constructive Criticism may be sent to:

Waldo
c/o FBI Headquarters 
935 Pennsylvania Avenue, NW
Washington, D.C.
20535-0001

2015-12-15

On SysAdvent 2015

Knowing that I'd regret it if I didn't, I took on yet another task, and wrote a thing for SysAdvent.

"Fear and Loathing in Systems Administration"

This is a title that I'd had kicking around in my head for awhile, and have started shopping it around as a conference talk.  I'm hoping to get some feedback on the article content in order to tweak the presentation content.