Full Time Job

Site Reliability Engineer, Cloud Infrastructure


Stamford, CT 06-01-2020
  • Paid
  • Full Time
  • Mid (2-5 years) Experience
Job Description

Meet Peacock, NBCUniversal's new, wildly entertaining streaming service that combines timeless shows and movies with timely news, sports and pop-culture.​ We're growing our team of smart, hungry, and upbeat doers who crave the chance to build something new at the epicenter of content, tech, and culture. We need fearless leaders and pop-culture fiends who work hard and fan hard. Creative problem-solvers who just so happen to be the reigning champs at Parks & Rec trivia night. So if this sounds like you, join our flock. And we promise, we won't put your stapler in Jell-O.

Role Purpose
The Site Reliability Engineer will be part of the Reliability & Performance team and will be responsible for maintaining the networking and infrastructure of the cloud platforms utilized to operate NBC's Direct-to-Consumer platforms.

• Work with Site Reliability Engineering teammates and Software Delivery teams to determine and implement cloud networking, monitoring, and infrastructure requirements
• Ensure that networks and infrastructure are highly available
• Develop methodologies to safely deploy and test network and infrastructure changes
• Design, create and deliver infrastructure, code or services to improve the availability, scalability, latency, and efficiency of our internal or customer-facing services
• Troubleshooting and problem solving
• Design multi-region/multi-cloud fault tolerant systems
• Drive DevOps culture across the department by providing consultancy to delivery teams
• Provide support for operations and delivery teams to remediate production issues as appropriate
• Build cloud-agnostic solutions that can be quickly deployed against a wide variety of cloud computing providers
• Participate in a 24/7 on-call rotation

Basic Qualifications
• Bachelor's degree in Computer Science, Information Technology or a relevant field
• Minimum three (3) years of experience in a DevOps or Site Reliability Engineering role
• Experience with CDN delivery providers such as Akamai
• Demonstrated experience with large scale 24/7 production environments
• Ability to follow established processes and workflows to ensure that all work is completed per group standards
• Configuration management / Infrastructure as Code (example: Ansible/Puppet/Chef/Terraform)
• CI/CD (Jenkins / Concourse / GoCD / GitLab)
• Networking (entry level Networking: ex. LB, Routing, Switching)
• Linux (Ubuntu, Debian, CentOS, RedHat)
• Containerization (entry-level, (e.g. docker/Kubernetes))
• Cloud Platforms (AWS/GCP/Azure)
• Monitoring (Prometheus, Grafana, Nagios)
• Logging (ELK/Splunk)
• Scripting / System-Programming (Python, Go, Bash, Java)

Eligibility Requirements
• Must be willing to work in Stamford, CT
• Must have unrestricted work authorization to work in the United States
• Must be 18 years or older
• Availability to travel as required

Desired Characteristics

Desired Characteristics
• Experience with a digital media direct-to-consumer business highly preferred
• Certification in AWS, GCP, or Azure a plus
• Exceptional verbal and written communication skills, comfortable communicating with technical and non-technical colleagues and executives
• Ability to understand large complex software systems and their interdependencies