About the role
About the job:
Shopify has many critical components, and sometimes they fail. The Resiliency team are the ones ensuring we can get back to green as fast as possible when that happens.
We will be setting the foundation for building and running resilient systems at Shopify. This is a team of engineers with in-depth operational knowledge of the entire Shopify stack, who will act as first responders and leaders during an incident.
Our job is to get to a resolution as quickly as possible and guide teams to build a more resilient Shopify. We will build the tools and systems used to quickly resolve incidents, and will look to automate away the manual toil.
Commerce happens 24/7, and we need to build a team that can respond whenever necessary. We are hiring for a distributed team to provide availability in Honolulu, Hawaii (UTC -10),
What you’ll do:
- Respond to automated alerts and execute playbooks.
- Manage ongoing incidents, using your understanding of Shopify to involve the right teams and resolve as quickly as possible.
- Clean up the noise in our signals, ensuring we can get an understanding of the system and debug a problem easily.
- Set the standards with teams for building resilient, debuggable systems.
- Ensure we never fail for the same reason twice.
- Follow up each incident to ensure the appropriate action items are in place and prioritized.
- You have experience handling on call shifts for mission critical systems.
- You have been responsible for the tools and processes used to debug and correct failures in those systems.
- You strongly reject the idea that on call has to be a terrible, disruptive experience.
- You are a generalist developer who is comfortable with multiple languages such as C, Rust, Ruby, and Go
- You have done hands-on development with cloud infrastructure (AWS, GCE, Azure, Kubernetes, Docker)
Nice to have but not necessary:
- You have handled multiple IMOC/on call shifts, and have navigated more than one incident through to the RCA process.
- You have experience working with a variety of open-source software including nginx, redis, memcached and MySQL.
- You have familiarity with network and web protocols, from IP to HTTP.
If you’re struggling to stay in your seat right now, amped up by how much this sounds like you and how ready you are for this challenge, we want to hear from you.
This role will set the foundation for a culture shift at Shopify. If this sounds like your dream job, click the “Apply Now” button to submit your application.
This posting will close on Jan 20th 2020 at 12pm EDT.
At Shopify, we are committed to building and fostering an environment where our employees feel included, valued, and heard. Our belief is that a strong commitment to diversity and inclusion enables us to truly make commerce better for everyone. We strongly encourage applications from Indigenous peoples, racialized people, people with disabilities, people from gender and sexually diverse communities and/or people with intersectional identities.
|Senior Site Reliability Engineer||Engineering and Development||Hawaii, United States - Remote|