company_logo

Full Time Job

Site Reliability Engineer

Turner

Atlanta, GA 07-21-2021
 
  • Paid
  • Full Time
  • Mid (2-5 years) Experience
Job Description
The JobWarnerMedia seeks a Site Reliability Engineer to lead SRE efforts within our WarnerMedia Technology & Operations (WMTO) organization. The SRE team owns and manages the infrastructure stack for our unified video delivery platform, a core set of products and workflows that power video acquisition, encoding, delivery, and playback across our WarnerMedia brands.

As a hands-on Site Reliability Engineer, you will be a key contributor to maintain and improve our highly-available, highly-scalable video systems infrastructure using containers, cluster management, cloud services, and performance tools to keep our systems available in 24x7 environments. You will help to build platform automation, configuration management, and service administration for cloud and on-prem environments. You will work closely with our product developers to review cloud architecture footprints, setup cloud stack resources, contribute code for infrastructure needs, setup CI/CD pipelines, and develop new tooling. Our tech stack includes AWS, Kubernetes, Docker, Terraform, Postgres, Mongo, Jenkins, and Elasticsearch.

The Site Reliability Engineer will need to be strong in DevOps and SRE practices, combining software and systems engineering to build large-scale distributed fault-tolerant systems. You will need to lead full end-to-end SRE projects that include cloud solution design, environment creation and configuration, deploying and supporting cloud services, and writing strong technical documentation. You will assist with critical incident management and on-call rotations for after-hours support.The Daily
• Code, scale, and support the cloud stack supporting WarnerMedia live streaming and VOD workflows with performance and cost efficiency as primary goals
• Design and implement cloud infrastructure with optimal decisions for availability, reliability, scalability, maintainability, and security
• Build tooling and services to track application/system health and performance
• Implement Infrastructure as Code using best practices to standardize stack resources setup across multiple environments
• Build, improve, and support CI/CD pipelines with developer buy-in and full automation
• Evaluate emerging cloud technologies for adoption via discovery and proof-of-concepts
• Develop operational automation and self-service frameworks for developers and support teams
• Monitor and troubleshoot applications and cloud infrastructure across all environments
• Provide tiered on-call support for critical application incidents during off-hours and weekends
• Write robust technical documentation for systems and processes

The Essentials
• Bachelor's degree in Computer Science or equivalent experience
• Deep understanding of AWS, containers, and Kubernetes (EKS, RKE)
• Experience in designing, implementing, and supporting container and serverless cloud stacks
• 2+ years as a site reliability / DevOps engineer for enterprise-scale systems
• 2+ years experience in cloud and container designs, architectures, and migrations
• 2+ years experience in AWS cloud technologies, with broad exposure to AWS suite of services including S3, EFS, RDS, ECS, EKS, ALB, Route 53, etc.
• 2+ years experience in software development lifecycle and application modernization
• Strong experience with Unix/Linux system administration at scale
• Experience using source control (git) and CI/CD pipelines
• Ability to code tooling using Golang, Node.js, Python, shell scripting, or other languages
• Strong problem solving and troubleshooting skills for incident remediation
• Ability to work in a dynamic, fast-paced environment
• Clear and effective communicator with both technical and non-technical audiences
• Experience in full digital video stack is a plus – video encoding (CMAF/DASH/HLS), adaptive bit rate packaging, CDN delivery, DRM solutions, and AWS video cloud service (MediaLive, MediaConvert, MediaPackage), video playback

Jobcode: Reference SBJ-gk373q-3-138-105-124-42 in your application.