Site Reliability Engineer: Edge Services
As television and media habits change, our mission remains true to the principles that founded Discovery – every day we seek to ignite people's curiosity to engage, entertain and enlighten the world around them through amazing viewing experiences.
The Direct to Consumer (DTC) Group is a technology company within the Discovery brand. We are building a global streaming video platform, and a suite of applications to support all of our network's brands globally. We are building modern container-based microservices operated on Kubernetes covering everything from search, catalog, video transcoding, personalization, to global subscriptions, and much more.
Discovery's Stockholm Engineering team is a hub for a growing global engineering group of hundreds of software engineers. As an SRE in the Developer Automation group within DTC, you'll be joining a team that is responsible for building a truly global, self-service platform to enable DTC's growing number of engineering teams to build, test, deploy, and manage the complete operational life cycle of their services in a fully autonomous fashion.
You will work in the platform team as a leader and help us build a common distributed global infrastructure. Our goal is to enable development teams to build and maintain high-quality services in a smooth and efficient manner. We empower autonomous teams that use the standardized tools and best practices while at the same time help lead the way forward. We make sure our platform scales and define strategies for ensuring our services are always available.
You'll solve some of the problems related to complex cloud-infrastructure automation, multi-region networking, authentication/authorization, logging/metrics collection at scale, and the management of large-scale Kubernetes cluster deployments across many AWS accounts. You'll architect platform APIs for other teams to build on top of, you'll develop Kubernetes operators, you'll design processes and workflows, and you'll help to do it all in a collaborative, team environment using modern, rigorous software development practices that emphasize testability, repeatability, and automation.
• Deep understanding of distributed systems in Kubernetes
. Hands-on experience with service orchestration and management, deployment activities, configuration management, and all necessary automation
• Strong grasp of process isolation, virtualization and containerization concepts and being able to apply them when necessary
• Extensive experience with cloud services and the surrounding tech-stack, we primarily use AWS at Discovery
• Knowledge of best practices, and hands-on experience of implementation of tooling for service Observability including metrics, traces and logs
• Knowledge of programming and system administration on Linux environments, preferably working on high throughput and low latency systems
Meriting skills and competencies
• Kafka experience (we use MSK)
• Operating the ELK stack at scale
• Prometheus (Cortex)
• Istio and Envoy
• Infrastructure as Code
• GitOps and Flux
• Java, Golang and gRPC experience
• OpenTracing (OpenTelemetry)
Explore your (tech)world – Discover us!
As we interview candidates continuously, we kindly ask you to register your application as soon as possible. If you have any questions don't hesitate to contact [[EMAIL]]
Jobcode: Reference SBJ-gp1124-3-236-122-9-42 in your application.
Discovery, Inc. is the global leader in real life entertainment. We serve passionate fans with content that inspires, informs, and entertains, providing leadership across deeply loved and trusted brands, such as Discovery Channel, TLC, Animal Planet, HGTV, Food Network, and Travel Channel.