Full Time Job

Lead Site Reliability Engineer


San Francisco, CA 11-08-2021
  • Paid
  • Full Time
  • Entry (0-2 years) Experience
Job Description
Do you want to be part of a team that makes streaming magic through one of the most reliable streaming services in the World? Our SREs provide expert engineering services in cloud automation, and reliability engineering to all of our services that power streaming for Disney+, ESPN+, Hulu, Star+ and more, home to 100 million+ subscribers and ESPN fight nights. We are passionate about our services running with maximum uptime and minimum latency so that our subscribers have the best streaming experience of all our content.
Our SRE teams are leading the improvements, optimization, and availability of applications across the Disney Streaming organization, taking a consultative approach to SRE by educating, mentoring, and delivering automation to foster performance and resiliency in best practice.

We are seeking out an individual who brings prior experience as a Team Lead. You should possess leadership and mentoring qualities, a vision for building reliable systems, and an ability to manage and guide technical solutions and improvements to completion. Working hand-in-hand with service development teams, , a team lead is expected to set short term and long term goals to help drive reduction of toil through automation, improvements in incident management and observability, new tooling solutions to expose and solve critical reliability issues, assure documentation is well maintained and help foster cross team pollination of knowledge and awareness. The role is highly collaborative and involves significant interaction with fellow team members, stakeholders and SRE leadership.

Fostering innovation is a critical component to success here at Disney Streaming. Therefore, the ideal candidate will also need to be highly adaptable to changes, and be able to pivot when required.

• Technical Decision Making
• Structuring short- and long-term work for the team (Road mapping)
• Mentor and support team members with Site Reliability Engineering best practices to ensure the team delivers to its stakeholders
• Work closely with development teams and provide them with technical guidance to ensure new features have the proper operational support and maintainability
• Develop software for the purposes of automating, monitoring, and maintaining deployed infrastructure and services
• Responsible for interpreting the business domain into a digestible format for the Engineers
• Assist leadership with engineer reviews on a regular basis, working with them to develop and execute individual Career Development Plans and targets
• Encourage and circulate Company culture amongst team members
• Represent the Company at conferences and meetups

Basic Qualifications:
• Track record of working as a Lead Site Reliability Engineer
• Experience designing and implementing automation tools
• Experience in at least one scripting language: Preferably Python or Go
• Experience working with cloud platforms, preferably AWS
• Experience with modern infrastructure services and concepts such as containerization, distributed systems, microservices, etc
• Experience running and monitoring large scale distributed systems
• Understanding of Software Engineering principles and patterns
• Track record of working with Linux systems in production

Jobcode: Reference SBJ-gmw2om-52-205-167-104-42 in your application.

Company Profile

Disney Streaming Services is responsible for developing and operating The Walt Disney Company’s direct-to-consumer video businesses globally, including the ESPN+ and Disney+.