Manager - Site Reliability Engineer, Core
Los Gatos, CA US
Netflix is the world's leading internet TV network, and we strive to bring engaging stories from many cultures to people all across the globe. With over 160 million members globally, the data platform is the core foundation enabling all of our product decisions that directly impact our customer experience when they watch Netflix.
Our platform runs tens of thousands of jobs and processes over a trillion events a day. We support over a thousand data analysts, data scientists, and engineers across the company to make business decisions. Having excellent reliability and insights into such a large and distributed cloud infrastructure is paramount to the success of Netflix.
This team is responsible for building services and tools to help discover insights and improve observability with the goal to improve efficiency of the data platform. In addition, this team also focuses on streamlining incident management, identification and socialization of operational best practices. It is truly a combination of art and science to build this virtuous cycle to improve overall user experience of the platform!
We are looking for an engineering leader to realize this vision for the team, to help evolve our operations practices, and to build infrastructure to support our continued scaling for global expansion.
Specifically, you and your team will
• Drive the ongoing vision of improving user experience and insights for all of batch data analytics and stream processing infrastructure.
• Build services to change the game on how we qualify systems failures and quantify systems health. Apply machine learning to qualify failures across the whole data infrastructure stack and do SLA prediction.
• Partner with rest of data platform team to drive our vision towards a 'data-driven' data platform.
• Evangelize operational methodologies. Continue to look for opportunities to automate and build tools to lower operational barriers, improve clarity on problematic areas, and improve reliability.
• Grow and develop a team of top-talent senior software engineers.
• Experience in people leadership and technical leadership.
• A passion for building and motivating teams to reach their potential.
• Strong partnership in driving multi-functional projects forward.
• Experience in building internal infrastructure that is shared across teams.
• Experience in running large scale distributed systems a plus.
The application for this position is hosted at the Employer's site. Click on the button below to open the application page in a new tab in your browser.Apply at Employer's Site