Cloud Operations Engineer
Reporting to the Director Center Operations (UK or US), this position is critical in ensuring the implementation of the vision for the 24/7 Technology Operations Centre organizational unit. This position will ensure the delivery and support of IT Event, Incident and Request Management processes at Discovery ensure a culture of excellence and consistent service and supporting Discovery's best in class distribution and infrastructure.
Highlights of the role
Utilizing strong cross-functional alignment, this position will be responsible leading unified ''virtual'' teams that will consist of members from multiple disciplines, (Distribution Operations Network Operations, Enterprise Platform Operations, Digital Platform Support Operations, and Security Operations), to form a dynamic technology focused team capable of providing monitoring all Discovery linear and non-linear output and platforms, as well as supporting IT infrastructure. The support operation also leads Discovery's response to all incidents within our infrastructure.
Organizational scope of Duty Operations Manager is day to day oversight of staff and services when on duty within our London, Krakow or Sterling, VA centers – circa 16 staff across multiple disciplines. Staff can be on from one or a number of sites at once. In addition the role holder will manage a group of between 6-10 staff within a functional group based on their skill group (Distribution Operations Network Operations, Enterprise Platform Operations, or Digital Platform Support Operations) based in 3 locations supporting 50 offices and production centers globally, each led by an Operations Center Director.
The position is key to ensuring organizational improvements, consistently maintain and improve our customer feedback system, and establish effective performance measurements to show successes and areas of opportunity.
This position is a mission critical role in leading GT&O's global Technology Operations Centers. As a function it is the 24/7 Command & Control Hub for all our all Distribution and IT support services. Post holders will be expected to work shifts including weekends whilst regular night work is not required it should be expected that the post holder will need to work nights during certain events.
Highlights of the role
The centers are the point of contact and owners for Major Incident Incidents and the postholder is responsible for the execution of the Major Incident process and procedures for Global Technology & Operations and the department's recovery plans. The centers are also the focal point of contact for Global IT's response in the event of an organization wide Major Incident.
This position is a member of the leadership team for Technology Operations and will guide the development of the team, and communicate the direction of the organization.
• Day to Day support leadership of cross-discipline staff, owning real-time Tier 1 and Tier 2 responses to active requests and incidents where needed. Lead a ''virtual'' center operation, to ensure the organization is properly staffed 24/7 across the two sites
• Support the creation of an integrated technology support organization based on collaboration, best practices, standards, efficiency, and commitment to effective service delivery and responsiveness to the needs of the business
• Drives continuous improvement to key metrics through process, system, and organizational refinements
• Ensure that event, incident, major incident and problem management processes are implemented effectively
• Acts as a major incident lead when on duty. Is a focal point of communication to GTO and Digital leadership during a major incident. Vet potential critical outages, determine and prioritise status
• Creation and ownership of multiple and trust wide business continuity reporting including disaster recovery
• Reporting on serious incident status including customer notifications. Ensures communications are accurate, timely and messages for multiple audiences.
• Responsible for the production of the post incident report and accountable for any remediation planning that results from lessons learnt
• Contributes to root cause analysis reports and accountable for any remediation planning that results from lessons learnt.
• Responsible for planning onsite cover during serious incidents, ensuring staff are directed to the correct area of focus
• Works with the Engineering and Product Teams to ensure the contingency plans are kept up-to-date and available.
• Planning onsite 3rd party presence during a serious incidents
• To manage staff in a matrix organisation, particularly overnight as the senior manager onsite – supervising a wide range of staff in a highly stressful situation
• Guide and mentor the Operations teams and user base in the event of serious incidents
• Develops and implements effective maintenance and back up regime across systems and infrastructure
• Maintains KPI and service metrics framework. Ensure service levels and targets are adhered to and corrective measures in place to maintain performance targets
• Maintains skills and career path framework for centre staff. Ensures these are in place for all staff
• Lead and deliver small to midsize projects or organisational change within operations centre scope
• Partner with relevant GT&O and Digital leadership teams on technology implementation. Ensure impacts on the department are understood and that mechanisms in place to manage these impacts and ensure service continuity
• Partner with GT&O & Digital Sports and Olympics teams for Sport Events and Olympic Games.
• Responsible for implementing effective business continuity and disaster recovery plans for leads an integrated technology support organization based on collaboration, best practices, standards, efficiency, and commitment to effective service delivery and responsiveness to the needs of the business
• Ensuring communications are accurate, timely and messages for multiple audiences
• Working as part of a small team, ensuring efficient handover and seamless delivery of support
• Develop and maintain strong working relationships with key business leads and senior stakeholders within the customer base
• Develop and maintain strong working relationships across all IT disciplines
• Develop and maintain strong working relationships across GT&O
• Bachelor's degree in Business Administration, Broadcast Engineering, IT Management, or equivalent work experience
• 3+ years direct management experience in the IT or Broadcast Support function
• 5+ years experience of working in one of more of the following areas: Distribution Operations, Platform Operations, Network Operations, Digital Platform Operations
• 5+ years' experience in an Enterprise-level support environment. Experience in a service delivery environment and understanding of technical support processes and workflow
• Working knowledge of ITIL required. Foundation certification expected. Must be able to effectively communicate with owners of ITIL Disciplines (Incident, Problem, Change, Release, and Configuration) to provide effective IT support to the end-users.
• Excellent verbal, written, interpersonal communication and customer service skills
• Strong organizational and conceptual skills
• Ability to multi
Discovery, Inc. is the global leader in real life entertainment. We serve passionate fans with content that inspires, informs, and entertains, providing leadership across deeply loved and trusted brands, such as Discovery Channel, TLC, Animal Planet, HGTV, Food Network, and Travel Channel.