Site Reliability Engineer (Lead): Developer Automation
The Direct to Consumer Group (DTC) is a technology company within Discovery that is responsible for building a global streaming video platform to support a broad collection of Discovery's diverse brands around the world including Discovery, TLC, Food Network, Investigation Discovery, Animal Planet, Science Channel, HGTV, Eurosport, MotorTrend, and many more.
DTC's software engineering teams build applications for the web, mobile, tablets, connected TVs, consoles, and other streaming devices. Those applications are backed by a fleet of modern, cloud-native microservices deployed to Kubernetes within AWS. It is a fast-growing, global engineering group crucial to Discovery's future.
As an engineer in the Developer Automation group within DTC, you'll be joining a group that is responsible for building a truly global, self-service platform to enable DTC's growing number of engineering teams to build, test, deploy, and manage the complete operational life cycle of their services in a fully autonomous fashion.
Your role will focus on the development of the platform core and common platform services. You'll solve problems related to complex cloud-infrastructure automation, multi-region networking, cross-cluster service meshes, authentication/authorization, and the management of large-scale Kubernetes cluster deployments across many AWS accounts. You'll architect platform APIs for other teams to build on top of, you'll develop Kubernetes operators, you'll design processes/workflows, and you'll help to do it all in a collaborative, team environment using modern, rigorous software development practices that emphasize testability, repeatability, and self-service automation.
The ideal candidate for this role will have 8+ years' professional experience with a wide breadth of experience across the entire software stack, as well as deep expertise in at least one technology from each of the following groups:
• Containerization & Container Orchestration at Scale (i.e. Kubernetes)
• Container Networking & Distributed Service Meshes (e.g. Istio, Linkerd)
• Cloud Infrastructure Automation (AWS strongly preferred) (e.g. CDK, Crossplane)
• Linux System Administration
• Distributed Systems Development (e.g. asynchronous communication patterns, consensus algorithms, distributed transactions)
• Services Programming (e.g. Go-lang, Java, Kotlin, Scala, Python, Ruby)
• Systems Programming (e.g. C, C++, Rust)
In addition, your technical expertise should match well to the following:
• Deep understanding of distributed systems in Kubernetes
• Hands-on experience with at least one IaC tool (e.g. CDK, Terraform, Crossplane)
• Experience with the development and operation of high throughput, low-latency systems
• Hands-on experience with automating development workflow pipelines (CI/CD)
• Operational experience (i.e. on-call rotation, incident response)
• Ability to collaborate effectively with remote peers across disparate geographies and timezones
• Excellent written and verbal communication skills with particular emphasis on technical documentation (including diagramming)
• Strong CS fundamentals