Job Description

Who we are

As television and media habits change, our mission remains true to the principles that founded Discovery - every day we seek to ignite people's curiosity to engage, entertain and enlighten the world around them through amazing viewing experiences.

The Direct to Consumer Group (DTC) is a technology company within Discovery that is responsible for building a global streaming video platform to support a broad collection of Discovery's diverse brands around the world including Discovery, TLC, Food Network, Investigation Discovery, Animal Planet, Science Channel, HGTV, Eurosport, Motor Trend, and many more.

DTC's software engineering teams build applications for the web, mobile, tablets, connected TVs, consoles, and other streaming devices. Those applications are backed by a fleet of modern, cloud-native microservices deployed to Kubernetes within AWS. It is a fast-growing, global engineering group crucial to Discovery's future.

Who are you

We're hiring a talented Staff Engineer for our reliability engineering team that is passionate about using software-based approaches to solve complex infrastructure challenges and automate those solutions. You'll be joining a team that is responsible for building a truly global, self-service platform to enable DTC's growing number of engineering teams to build, test, deploy, and manage the complete operational life cycle of their services in a fully autonomous fashion.

You'll be a voice of reason and advocate for engineering best practices. You will help drive technical decision-making, particularly with regard to the architectural direction of the platform components. You'll help solve problems related to complex cloud-infrastructure automation, multi-region networking, cross-cluster service meshes, authentication/authorization, and the management of large-scale Kubernetes cluster deployments across many AWS accounts.

To be successful, you'll need to be deeply technical and capable of holding your own with other strong peers. You possess excellent collaboration and diplomacy skills. You have experience practicing infrastructure-as-code as well as related areas including site reliability engineering, CI/CD, DevOps, and Agile development. In addition, you'll have strong systems knowledge and troubleshooting abilities.

If you love solving problems at scale, prefer to build scalable, reliable, and testable software to automate infrastructure management, are an ace troubleshooter, and are deeply technical, then this is the role for you!

Key Responsibilities
• Plan, lead and execute complicated technical projects that interact with a wide variety of teams within the company.
• Work with internal customers and stakeholders to drive the design, development, and support of our Discovery+ cloud platform.
• Work on providing a highly automated infrastructure for deploying and scaling a distributed, multi-tenant, high-performance compute and data platform.
• Develop software and tooling to facilitate greater automation and operability of services.
• Make high-impact decisions driving how and what software gets built. Your decisions are often right, and you are persuasive in delivering your suggestions and ideas to your team.
• Mentor senior engineers, overseeing their designs, code quality, and integration into a team. Your success is judged as much on your own productivity as on the positive impact you have on engineers around you.
• Provide guidance on design, coding, and operational best practices, and have a track-record of applying these best practices to software that you have worked on. You can propose and create best practices proactively where none exist.
• Utilize your deep experience and problem-solving skills to help prevent and investigate production issues as well as participate in a shared on-call rotation.

Skills and Experience
• At least 9 years of overall experience in software, systems, and infrastructure
• At least 4 years of experience managing public cloud infrastructures, such as AWS, GCP, or Azure, including design, implementation, and maintenance of large-scale computing environments.
• Strong software development skills in languages such as Go/Java. Must have CS fundamentals and a track record of implementing highly reliable software.
• Strong knowledge and implementation history of Terraform, Ansible, Salt, Pulumi, CloudFormation, and/or another similar tooling.
• Strong experience in creating and operating highly available API:s serving millions of users.
• Deep understanding of Docker, Linux, networking, distributed systems, microservice architecture, cloud design patterns, and security.
• Experience with container orchestration technologies such as Kubernetes, OpenShift, DC/OS (Mesos / Marathon), Titus, AWS EKS, or Google GKE is strongly desired.
• Able to calmly and efficiently debug, troubleshoot, and resolve complex technical issues.
• Solid interpersonal skills conducive to a team environment.
• Self-driven & motivated, with a strong sense of ownership, work ethic, and a passion for problem-solving.
• Exceptional written and verbal communication skills. Can effectively communicate vision and plan with the audience.
• Experience working across product, engineering, and analytics teams to evaluate new ideas, discuss technical concepts, create scalable designs, implement new models, and make tradeoffs to remove roadblocks.
• BS/CS, MS/CS or equivalent.

