company_logo

Full Time Job

Site Reliability Engineer

Sinclair Broadcast Group

Cockeysville, MD 01-19-2021
 
  • Paid
  • Full Time
Job Description

Sinclair Broadcast Group is looking for an exceptional Site Reliability Engineer to maintain and continually improve our virtualized and cloud-based deployments. Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The Site Reliability Engineer's mission is to ensure uninterrupted service for Sinclair customers and act as a force multiplier for Sinclair product teams to deliver better products faster.

The SRE team's mission is to build foundational services as well as tooling and automation to allow product teams to release and scale reliably and predictably. SREs are team players who support product teams as needed to advance the architecture and performance of software systems and train their peers in topics such as debugging distributed systems, building self-healing infrastructure, and ensuring availability to fuel the company's growth.

Responsibilities:
• Operate, monitor, and maintain high availability of software service for Sinclair products running in a multi-region AWS cloud and on-premise VMware environments
• Automate, scale, and manage our VMWare and AWS cloud infrastructure
• Work with multiple stakeholder teams to establish service level objectives and monitor to ensure the objectives are met
• Continually improve cloud operations automation and tooling to monitor and maintain enterprise cloud-based applications
• Troubleshoot infrastructure and application issues and work with development and operations teams to resolve issues
• Identify and improve on possible points of failure in the infrastructure/applications
• Execute automation for known cloud-operations tasks, and create a new automation for new situations or issues you encounter; automate everything
• Collaborate with a great team to maintain, monitor, and improve amazing applications that deliver content for end-users
• Facilitate blame-free root cause analysis meetings in the event of a production-systems incident so that the team can learn from mistakes and improve our systems and run books
• Participate in stress, security, and performance testing
• Be Vigilant about security and adhere to best practices to secure our cloud infrastructure and real-time platform
• Design, write and deliver software and automation to dramatically improve the availability, scalability, latency, and efficiency of Sinclair services
• Plan and perform security patches on our applications and underlying infrastructure
• Help secure our data and access policies to reduce risk
• Take pride in the quality of your code, the work it takes to make great software, and the value delivered to the end-user
• Troubleshoot application-related support requests to locate the problem area, resolve those which are within your skillset, and forward the others to the appropriate staff
• Perform infrastructure operations and management tasks to provision new customers, address operational requests, and keep the application running efficiently and effectively

Experience, Skills & Competencies:
• Deep understanding of AWS cloud services and how to leverage them for compute, storage, and managed services
• Deep understanding of VMware environments and how to leverage them for compute, storage, and managed services
• Experienced with modern DevOps engineering practices and comfortable with diverse technical problem sets, across the entire technology stack, including the virtualized hardware
• Possess a deep understanding of the Linux operating system and are at home on the command line/terminal at your workstation
• Versed in infrastructure as code practices using technologies like Terraform.
• Familiar with Ansible leveraging the tool for configuration automation
• Proficient in scripting and developing automation in Python and bash, or similar programming languages
• Used to keeping everything you do in source control and automating (scripting) any task you have to do more than once
• Understand modern approaches to software security – and know what needs to be done to secure software systems and cloud-based infrastructure
• Equipped with a proactive security mindset and a solid understanding of information security and privacy principles
• Experienced in protecting modern, cloud-hosted operating environments using defense-in-depth strategies
• Comfortable operating in environments subject to regulatory, compliance, and risk-based security requirements
• Able to effectively troubleshoot issues across the entire stack from UI- > API – > Application – > Database, including the operating system and the underlying (virtual) hardware
• Enthusiastic about cutting-edge technologies and fresh challenges that come with them
• Possesses service and customer-oriented mindset and a willingness to dig into the application rather than throw the problem over the wall
• Excellent verbal and written communication skills being able to convert complex topics into simple to understand language to educate stakeholders and executives of Site Reliability Concepts and Designs

Ideal Qualifications
• Excited about monitoring technologies, the metrics they provide, and using the data to extract information about the performance characteristics, and error modes of a cloud-based software stack
• Proficient as a developer, experienced writing code and solving problems in Python
• Experienced maintaining and supporting feature-rich applications using modern software frameworks
• Understanding of computer networking and how it applies in cloud environments
• Related technical experience in cybersecurity, preferably in a cloud environment
• Experience securing corporate networks, cloud networks, and VPNs.
• Holds a bachelor's degree in computer science, Electrical Engineering, or another scientific or technical discipline

Sinclair Broadcast Group, Inc. is proud to be an Equal Opportunity Employer and Drug Free Workplace!

*LI-SP1

Jobcode: Reference SBJ-roqex2-18-221-187-121-42 in your application.