Description
AWS Incident Response is at the heart of high availability of Amazon Web Services. We make customer impacting events shorter and less frequent by providing large scale event and incident management. Our automated tooling quickly identifies the cause of an issue and helps mitigate its impact, and much of our engineer time is spent on projects to improve the tooling and automation. We also provide manual incident management for AWS and other Amazon groups, directing the resolution of an issue with service teams, and diving deep into those events to drive improvements to the tooling. It's an exciting time to join our team as we are rapidly growing and expanding our offerings.
As a Support Engineer on the team you will lead projects and build processes to reduce the duration, frequency, and impact of issues within the AWS and Amazon infrastructure. You will also spend a portion of your time directing the resolution of high visibility incidents by leading conference calls and teams across the globe. Using data learned from those incidents you will drive further improvements into our automation, tooling, and processes so that the next event is shorter or avoided entirely. You will participate on project teams to expand use of our tooling to additional areas across Amazon. You'll also have the opportunity to grow your coding skills by taking on development projects matched to your ability level. If you're looking for a supportive team with great growth potential and an opportunity to make a huge impact, this is the team to join.
Key job responsibilities
Drive the resolution of large scale customer impacting issues as part of a team rotation, including some weekends and holidays
Lead projects and teams across the globe to drive operational improvements
Design, build, and enhance incident detection and response tools
Identify and troubleshoot recurring platform issues and own projects to drive improvements
Create and review documentation; design new standard operating procedures
Mentor peers in your areas of technical and operational strength
A day in the life
A Support Engineer on the AWS Incident Response team has full visibility on all AWS services! There are limitless opportunities to learn as we work with AWS internal teams and have visibility into all AWS products and services.
When oncall, we provide incident management capabilities through conference calls and automation, to support internal AWS teams during the response, diagnosis and mitigation of large scale events.
When not oncall, we build processes and automation to help AWS experience fewer, shorter and smaller customer impacting incidents.
About the team
The AWS Incident Response (AIR) team is Amazon’s central defense against large-scale incidents and drives operational excellence across all of Amazon businesses. Our key offering to Amazon is best-in-class Incident Management. Our engineers are front-and-center in driving down event duration through experience in operational excellence, current best practices and incident management tooling.
We are open to hiring candidates to work out of one of the following locations:
Dublin, D, IRL
Basic Qualifications
3+ years relevant work experience
Experience using and troubleshooting Linux or Unix based systems
Experience troubleshooting and resolving technical issues in a distributed environment.
Solid grasp of networking fundamentals
Experience automating tasks using scripting languages.
Preferred Qualifications
Experience building services for a large scale cloud platform such as AWS
Knowledge of current best practice frameworks such as ITIL
Experience driving and managing large troubleshooting efforts
Experience dealing effectively with internal technical teams during problem resolution
Ability to effectively operate and communicate efficiently under pressure
Experience dealing effectively with internal customers during problem resolution and operating efficiently under pressure
Effective organizational skills and the ability to maintain a consistently high standard of operations in a busy environment
Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need an adjustment during the application and hiring process, including support for the interview or onboarding process, please contact the Applicant-Candidate Accommodation Team (ACAT), Monday through Friday from 7:00 am GMT - 4:00 pm GMT. If calling directly from the United Kingdom, please dial +44 800 086 9884 (tel:+448000869884). If calling from Ireland, please dial +353 1800 851 489 (tel:+3531800851489).