Manager - Site Reliability Engineer, Core
Los Gatos, CA US
Who We Are
Netflix is the world's leading streaming entertainment service with over 182 million paid memberships in over 190 countries enjoying TV series, documentaries, and feature films across a wide variety of genres and languages.
We deploy hundreds of microservices, across multiple regions, tens of thousands of instances, and have millions of client devices with our software installed. That's a lot of infrastructure and software that collectively work together to give customers an end product they love to use. Ensuring availability and reliability across that scale is a task that's taken on by our amazing Netflix engineers through full operational ownership of their software.
Our team, Insight Engineering, builds software to provide real-time operational insight to our 1000's of engineers and teams across Netflix. This requires collecting, streaming, and persisting operational data, HUGE amounts of it, and making that data accessible through flexible APIs and visualizations.
Our team is looking for a Senior Distributed Systems Engineer to help build and manage our Tracing infrastructure across the entire Netflix service. If you enjoy working in a unique culture of Freedom & Responsibility, designing and building critical systems at scale that are relied on across the organization, you will have the opportunity to:
• Architect, design, and build systems that can effectively collect, stream, ingest, index, and persist billions of Zipkin traces every day.
• Design and build APIs for partner teams to query the data they need quickly and efficiently from our tracing system, and perform new analytics and queries that give insights into their services
• Enable client integrations and features for various languages and runtimes is use
• Work closely and consult with our customers and partners to implement new capabilities over time that can scale to support our rapid growth and global expansion.
What You Bring to the Table
• Proven expertise in building and operating scalable distributed services for real-world use cases
• Deep knowledge of concurrency, resiliency, caching, HTTP and REST
• Experience handling time-sensitive and/or large data sets
• Good understanding of various programming languages (i.e. Java, Node.js)
• Experience with Tracing (in particular Zipkin) is a plus
• A customer-focused attitude
Sharing Is Caring
In this group, you'll have a chance to create software that is state of the art and foundational. Because of Netflix's desire to share technology and concepts, you'll be in the rare position of both working on this and sharing this knowledge with your peers outside Netflix. We believe this is unique to Netflix, and if it sounds amazing to you, we should talk.
The application for this position is hosted at the Employer's site. Click on the button below to open the application page in a new tab in your browser.Apply at Employer's Site