Sorry! Looks like this position has been filled by the employer and the listing was closed on 10/06/2020

Full Time Job

Site Reliability Engineer

NBCUniversal

Seattle, WA 09-24-2020

Paid
Full Time
Mid (2-5 years) Experience

Job Description

Responsibilities
The most exciting online news organization is currently searching for the right person to join an amazing Web Operations Team. The person absolutely has to be driven, passionate about technology, the web, and must display a level of customer service that is second to none.

Although you must be able to handle a fast-paced, pressure packed and often excitable work environment, (especially during major breaking news events), the reward of applying your knowledge to help maintain, support and architect highly performing and available web sites is very gratifying.

If the job responsibilities below sound appealing to you, and you have the required skillset, please send us your resume for an extraordinary opportunity to work with one of the most innovative news organizations on the internet.

The core focus for our Site Reliability Engineers (SRE) is centered on ensuring all the applications and services we support are highly available, performant and fault tolerant.
SRE team bridges the gap between 2 major groups in our technology organization; 1. The tier 1 operations support team and 2. Product and software development teams.

We are excellent problem solvers and troubleshooters, especially while working on outages or disruptions at any severity.
The SRE engineer core duties and responsibilities fall into these 4 buckets:

Product Planning
• Attend meetings on our quarterly ''objective based'' projects that are assigned to you so that you understand and communicate to the Ops team the product summary and development requirements.
• Provide expertise and guidance as it relates to infrastructure options, risks, impact, effective time estimations and cost.

Infrastructure and Cloud Hosting Expertise
• Strong knowledge of content delivery networks such as Akamai
• Strong knowledge of cloud providers such as AWS and experience with AWS Services such as EC2, VPC, Lambda, s3, etc.
• Responsible for DNS management, capacity planning, database and configuration management

Monitoring / Alerting and Security
• Effectively monitor all applications and services by leveraging monitoring platforms such as Splunk, New Relic, Rigor, etc.
• Design and implement targeted alerting to reduce the signal to noise ratio.
• Continuously endeavor to ensure our applications and services we support are integrated with our Cyber Security best practices.

Maintenance & Problem Solving
• Routine application maintenance tasks are an ongoing responsibility of SRE Engineers that they accomplish via strategy-building techniques. Help create requirements and procedures for implementing routine maintenance. Troubleshooting existing information systems for errors and resolving those errors. The core focus for our Site Reliability Engineers (SRE) is centered on ensuring all the applications and services we support are highly available, performant and fault tolerant.

Qualifications/Requirements
• A Bachelor's Degree from an accredited college -OR- A

four-year high school diploma or its educational

equivalent and 10 years of experience in IT field
• 5+ years' experience in Systems engineering &

production

web hosting environment
• Ability and experience installing, configuring and
optimizing performance on Linux (Debian/Apache or

CentOS/NGINX) operating systems.
• Experience with the following technologies required:

Linux/Debian/CentOS, Apache, NGINX, MySQL,

PostgresSQL, MongoDB, NFS, SSL, DNS/Bind,

common internet protocols, Akamai edge caching.
• Solid understanding of networking (VPC) and load

balancing (ALB/ELB) concepts within AWS. Experience

with F5, HAProxy,
• Knowledge and experience writing plugins or running

queries for monitoring tools such as New Relic, Splunk

and Nagios.
• A solid understanding of revision control systems such

as GitHub including feature branches, committing code,

pull, pushes, etc…Management of GitHub Enterprise

account a plus.
• Knowledge of DNS, domain registration and hosting.
• Experience setting up and configuring AWS production

environments and AWS Services/tools.
• Strong focus on organization and attention to detail,

writing spec and project plans.
• Ability to work well in a team environment as well as

independently as required
• Highly motivated & driven team player
• Proficiency in English language, verbal and written with

ability to build and foster relationships
• Must be legally authorized to work in the United States

without the need for employer sponsorship, now or at

any time in the future.

Desired Characteristics
• A Bachelor's degree in a related IT field + 5 years IT

experience
• 5+ years' experience in Systems engineering &

production web hosting environment and 5+ years in

software development or equivalent automation /

scripting experience.
• A foundation in ITIL processes coupled with an

understanding of how these processes are implemented

and managed in a critical service delivery environment is

preferred.
• Containers/Tools & Virtualization (Docker, Kubernetes,

ECS, EKS, EC2)
• CI/CD tooling (Jenkins, Puppet, Foreman, RunDeck,

etc)
• High Level Programing Language | Coding/Scripting

experience required – Bash, Groovy, Ruby or Python.
• Infrastructure as Code – Terraform, Helm,

CloudFormation

Jobcode: Reference SBJ-reovn8-3-15-156-140-42 in your application.

Full Time Job

Site Reliability Engineer

NBCUniversal

Job Description

Find More Jobs Like This

Location

Similar Listings

Lead Web Application Engineer

Software Engineer

Senior Data Engineer

Sr Software Engineer - Scala