Full Time Job

Site Reliability Engineer, Data

Hulu

Santa Monica, CA 06-17-2021
Apply @ Employer
  • Paid
  • Full Time
  • Mid (2-5 years) Experience
Job Description
The Data Reliability Engineering team for Disney Streaming Services (DSS), a segment under the Disney Media & Entertainment Distribution (DMED), is responsible for maintaining and improving the reliability of DSS' big data platform, which processes hundreds of terabytes of data and billions of events daily. We are looking for a Data Reliability Engineer to help us in the ongoing mission of delivering an outstanding service to our users and make DSS more data-driven. Additionally, you will work closely with our partner teams on our incident management processes including post-mortem, root cause analysis and preventing incident recurrence. You will be making an outsized impact in an organization that values data as its top priority. Are you passionate about reliability engineering, automation and software delivery excellence? If that is the case, we believe this is the role for you.

WHAT YOU'LL DO
• You will work closely with your counterparts on the Data Reliability Engineering team and our partner teams to improve automation, resiliency and maintainability of our data systems.
• You will build solutions to continually improve our software release and change management process using industry best practices to help DSS maintain legal compliance.
• You will help to design and build systems that improve the reliability, resiliency and maintainability of our big data systems and products.
• You will help to build out observability and intelligent monitoring of data pipelines and infrastructure to achieve early and automated anomaly detection and alerting.
• Plan service capacity and testing automation, design business continuity and disaster recovery plans and processes and work with the engineering team on implementation.

WHAT TO BRING
• 2+ experience working on Linux environment, and proficient with cloud environment (AWS)
• 2+ years of hands-on experience in Reliability Engineering for high-performant, scalable and distributed data systems with a focus on automation
• Detailed problem-solving approach, coupled with a strong sense of ownership and drive
• A passionate bias to action and passion for delivering high-quality data solutions
• Experience coding in one or more of the following programming language: Python, Java, or Scala
• Understanding of CI/CD principles, familiar with version control systems (Git)
• Attention to detail and quality with excellent problem solving and interpersonal skills

Jobcode: Reference SBJ-g3kkj9-35-172-136-29-42 in your application.