At our core, Electronic Arts is a game maker that connects hundreds of millions of players from around the globe to some of the world's greatest games. The EAX team is driving the strategy and implementation of important initiatives for EA's community of players to connect them to one another and to the games they love to play. These initiatives span the digital touchpoints where players interact with the EA Network, including: EA's gaming service on PC via Origin and the new EA Desktop application; the EA Play subscription program on PC, PlayStation, Xbox, and Steam; EA.com's corporate, marketing and player engagement activations, and a host of other consumer experiences and strategies to connect players across platforms and within our games.
In your role as EAX Site Reliability Engineer, you will oversee the infrastructure platform, which means building orchestration and automation tooling to support the new EA Desktop application at global scale, all with an eye on reliability, resiliency and robustness. Reporting to the EAX SRE Manager, you will be an important contributor to the EAX SRE team based in Vancouver, Canada.
You're someone with a track record for:
• Designing and developing compute infrastructure to support large-scale client service applications. In particular, in-depth experience using AWS infrastructure.
• Designing and developing tools to aid in the orchestration and management of infrastructure and applications.
• Maintaining multi-tenanted kubernetes clusters following industry recommended SRE tenets of monitoring, change management, emergency response, provisioning, capacity planning, efficiency and performance.
• Collaborating with peers and partners across many locations to identify and support a shared set of goals and applications.
• Working with CI'/CD pipelines to achieve multiple deployments per hour'/day.
• Researching and helping adopt technologies that improve team efficiency and capability.
• Investigating complex problems and recommending appropriate data-driven solutions based on available tooling, information and requirements.
• Defining metrics and identifying accurate SLOs to help track and maintain overall quality of service.
• Troubleshooting and addressing high-severity issues in a live service environment.
You also bring the following skills or experiences to our team:
• Understanding of networking fundamentals and configurations (VPN, Virtual IP, VPC, CDN).
• 3+ years of experience in a technical role focused on development or operation of diverse and complex services or legacy systems.
• Experience with full-stack web development, particularly with NodeJS, Ruby on Rails, Go.
• Experience with provisioning tools such as Terraform'/Cloudformation'/Helm.
• Experience with changes in pace and direction that may occur in a large team environment.
In a typical week, the EAX Site Reliability Engineer could …
• Work on our infrastructure-as-code tooling to support the continued runtime operation of our infrastructure platform supporting millions of gamers.
• Work with technical leads and developers in other EAX application teams to identify and advocate for architecture and service changes to improve reliability and performance.
• Work with developers and other SRE to investigate and resolve performance or functionality issues.
• Experiment with new technologies to solve current challenges.
• Work with vendors to evaluate and adopt their products into our infrastructure platform.
You'll build relationships and work with…
• Other members of the EAX organization, especially technical leads.
• Vendors such as Grafana, New Relic, AWS.