It is an exciting time to be part of SIE's Site Reliability Engineering (SRE) leadership team! SRE teams operate at the intersection of Software Engineering and Infrastructure Engineering. These teams strive to make the PlayStation Network Platform a highly reliable, scalable, operable and secure product and service. SRE Managers are responsible for leading and enabling teams of engineers that build and operate services which are ''always on'', highly performant, and create the foundation on which our platform is delivered to customers.
Are you passionate about leading best in class engineers?
Would you like to help solve large-scale observability challenges?
If you answered ''yes'' to these questions and you love finding creative ways to demonstrate new technologies – consider joining our team!
The Site Reliability Tools (SRT) team within SIE's Platform Hosting Engineering organization provides critical services used across SIE to provide visibility into the performance and availability of PlayStation Network services to our players, partners, and other customers. Our teams work closely with developers, operations teams, and leadership to ensure we have the right set of tools to generate, collect, analyze, visualize and alert on operational data, so we know exactly what happens across the PlayStation ecosystem and can see problems before they occur and address them as quickly as possible.
• Lead the teams of software and systems engineers to deliver critical logging, monitoring, tracing, and alerting services across SIE, and be directly responsible for PlayStation's stellar uptime record
• Drive teams to build automation to prevent problem recurrence and automate responses to errors and alerts
• Maintain relationships with 3rd party providers of software and services, to align partner roadmaps with the organization's needs
• Improve upon and deliver the vision and strategy for observability through collaboration with engineering, operations, and leadership across the entire PlayStation ecosystem
• Lead by example, care for your team, and establish credibility with the quality of your team's technical execution
• Ability to direct a team of managers and highly technical and skilled engineers developing software, delivering services, and operating critical systems at large scale
• Strong collaboration and communication skills with the ability to partner and influence other managers, engineers and executives
• Proven track record of building, growing, and leading technical teams that effectively deliver services following agile principles
• Fluency with running running large scale, high performance, distributed systems while improving the ''illities'' (reliability, availability, serviceability) of those systems
• Demonstrated experience following software engineering best-practices
• Experience with automation, configuration management, and CICD operating in public cloud environments
• Knowledge of the software development lifecycle with experience integrating Open Source tools
• Able to lead teams to solve complex issues across a cloud-based micro-services environment
• Knowledge and experience with operating logging and monitoring systems at scale such as Splunk, CA, Datadog, CloudWatch, ELK, Sensu, Zabbix
• Strong belief in driving operational excellence with owning efficiency and automation at the core of operations
• Passionate about automation and process improvements, with a desire to standardize tools and technologies
Required Soft Skills
• Desire to champion developer needs and integrate them into your teams' priorities
• Methodical and systematic problem-solving approach
• Execution oriented and results driven, demonstrating complete ownership of end-to-end solutions
• Customer and peer relationship focused with a proven ability to effectively partner with local and remote groups of internal customers
• Ability to thrive in a fast-paced team environment
• Ability to learn new skills/technologies quickly and independently
• BS in Computer Science, Software Engineering, or equivalent experience
• 12-15 years professional experience operating systems at scale
• 7-10 years experience leading teams
This role requires occasional travel to other SIE locations around the world.
Jobcode: Reference SBJ-g3k1y5-35-172-223-30-42 in your application.