What the heck is a Game Day?

A game day is defined as a day that:

Simulates a failure or event to test systems, processes, and team responses. The purpose is to actually perform the actions the team would perform as if an exceptional event happened.

Sorry, not a day we play games all day.

Is a Game Day a waste of time?

Not at all. In fact, it’s a fun way to ensure that we know how to respond to outages and communicate with stakeholders (aka. payroll team, trust, etc.). It also provides us an opportunity to demonstrate our recovery and debugging prowess.

🍽️ Plus, lunch is provided.

So you’re going to cause problems to systems on Game Day? How do we know what to watch out for?

On the day of, you’ll be given a brief description of what system will be affected along with a start and end times. Existing monitors will be in place for any system that will go down or be aversely affected.

Note that the system can be aversely affected any number of times for any number of reasons. It’s up to you to diagnose, triage, and fix.

One last note that any system affected will be in a staging environment, and never production.

Can I prepare?

Sure, however, no specific instructions are provided for preparation. So good prereads might be:

https://www.atlassian.com/incident-management/handbook/postmortems#root-cause-categories-and-their-actions

https://sre.google/sre-book/postmortem-culture/

What happens if an actual, production issue occurs?

We immediately stop the game day, fix the issue, and reschedule.

Who’s the stakeholder?

@James Will act as your stakeholder throughout the process. Assume he is a customer/consumer of the product that went down or is not working right.