Senior Site Reliability Engineer
San Diego, CA
This is a fast-paced, position with the Shared Services Site Reliability team, supporting the establishment of a linear scale, highly available, fault-tolerant, globally distributed data and services platform. This role will own the technology transformation of the SRE team to include existing and new services supporting the dramatic customer growth on the platform.
You are passionate about delivering highly scalable software in the cloud. You are passionate about the development of automation and tools around deployment, scaling, and operational processes. You understand that the quality of your work determines if someone on your team will need to answer the phone in the middle of the night.
The individual's primary responsibilities will include contributing to the implementation and delivery of the end-to-end automation platform, to support DevOps activities, with a focus on developer self-service capabilities. This role includes team leadership, coaching, mentoring, and day to day engagement in a highly engaged, fast-paced team. This position requires extensive technical expertise and knowledge of container-based cloud environment and database expertise, preferable Oracle. Broad industry knowledge, strong customer focus, and excellent communication skills are a must.
• Good Kubernetes experience
• Extensive AWS CloudFormation experience
• Contributes to a team of engineers to deliver highly available, self-service, SRE capabilities.
• Showcases uncompromising ownership of outcomes and deliverables
• Role model for customer-focused delivery for both internal and external customers
• Experienced engineer that drives Operational Excellence within the team
• Builds and cultivates Agile engineering capabilities and quality engineering practices.
• Forward-looking engineer with execution know-how to take SIE to the next level of SRE & DevOps practices
• 5+ years of related experience
• Strong proficiency & working knowledge in one or more automation framework such as Chef, puppet, Ansible, CFEngine, etc.
• Proficiency and working experience in operating/administering in AWS (preferred to have AWS certification. Minimum ''AWS Certified SysOps Administrator'' or ''AWS Certified Solutions Architect'')
• Clear understanding in the CICD process.
• Jenkins (preferred)/GOCD/Travis
• Github (preferred)/P4/SVN
• Deployment strategies (Blue/Green and so on)
• Strong Linux/Unix. (Preferably with some past sys-admin experience)
• Experience in networking and other data-center technologies
• Experience setting up and administering Monitoring & Alerting (preferred to have knowledge/experience with Splunk, Datadog, Grafana, and CloudWatch)
• Demonstrated troubleshooting skills
• Performance troubleshooting
• Ability to debug basic applications
• Ability to participate in large scale collaborative troubleshooting sessions.
• Experience in supporting large enterprise solutions, spanning multiple geographic zones.
• Experience/knowledge in real-time databases (such as Oracle, Cassandra, Aerospike, etc)
• Extensive experience with agile development methodologies and processes required.
• Must possess outstanding verbal and written communication skills, and be able to work with others effectively
• Strong experience EKS & K8.
• Strong understanding of DevOps & SRE
Recognized as a global leader in interactive and digital entertainment, Sony Interactive Entertainment (SIE) is responsible for the PlayStation® brand and family of products and services.