Site Reliability Engineer (DevOps/Cloud Infrastructure)
Production Infrastructure & Engineering (PI&E) organization provides the essential platforms and infrastructure hosting solutions that power EA's live services. Our charter is to make EA's games and services available to all players anytime and anywhere. To do this, we focus on the high availability of infrastructure, primary services, and studio services. We aim to help developers to experiment and build new games quickly with infrastructure services on-demand and workflows that promote rapid development in the cloud. In all of this, we focus on being there for players where and when they want to play.
As a Site Reliability Engineer, your role covers the entire life-cycle of a product-- from helping developers with architecture and delivery to on-call incident response and triage. Your primary focus will be automation and continuous integration/delivery with an emphasis on solving operations issues using software. You will report to the Senior SRE Manager.
• You will build and operate distributed, large-scale, cloud-based infrastructure using modern open-source software solutions.
• You will use automation technologies to ensure repeatability, eliminate toil, reduce mean time to detection and resolution (MTTD & MTTR) and repair services.
• You will perform root cause analysis and post-mortems with an eye towards future prevention.
• You will design and build CI/CD pipelines.
• You will create monitoring, alerting and dashboarding solutions that improve visibility into EA's application performance and business metrics.
• You will produce documentation and support tooling for online support teams.
• 3+ years of experience with Virtualization, Containerization, Cloud Computing (AWS preferred), VMWare ecosystems, Kubernetes, or Docker.
• 3+ years of experience supporting high-availability production-grade infrastructure and applications with defined SLIs and SLOs.
• Systems Administration experience, including a strong understanding of Linux / Unix.
• Network experience, including an understanding of standard protocols/components.
• Automation and orchestration experience including Terraform, Helm, Chef, Puppet, Packer.
• Experience writing code in Python, Golang, or Java.
• Experience working with distributed systems.
Jobcode: Reference SBJ-rjw9j1-35-172-223-251-42 in your application.