company_logo

Full Time Job

Sr. Incident and Problem Manager, Technical Operations

Audacy

Philadelphia, PA 11-23-2021
 
  • Paid
  • Full Time
  • Senior (5-10 years) Experience
Job Description
Overview

Location: This opportunity is available remote, however, the majority of the team sits in Philadelphia and New York.

Audacy is looking for a Sr. Incident & Problem Manager to join our Technical Operations team within the digital business line. You will be responsible for driving complex outages to resolution in a timely and effective manner through coordination of internal teams and third-party vendors.

You will also be responsible for building and evolving the practice of Incident & Problem Management across Audacy's digital organization, closely partnering with internal teams and key stakeholders to drive remediation programs to closure. As part of this work, you capture insights, drive Post Incident Reviews, and develop processes to reflect platform improvements globally. Your work in this role will use cutting-edge technologies and industry concepts to directly prevent customer and business-impacting events.

Responsibilities

What You'll Do:
• Creates and develops organizational policies, processes, and procedures for Incident and Problem Management. This may include an annual review and approval of policies by appropriate management.
• Responsible for managing escalations in the Incident and Problem Management processes, including providing reporting and feedback to department leaders. Uses discretion in identifying and responding to complex issues and assignments.
• Creates and provides periodic Management Reporting (KPIs & SLAs) relating to Incident and Problem Management, including goal metrics that are specific, measurable, attainable, realistic, and time-bound.
• Facilitates and coordinates the Incident and Problem Management related service restoration bridges and meetings to assure adherence to approved policies and procedures.
• Responsible for developing and delivering training and guidance for the Incident and Problem Management process with stakeholders, based on their roles and responsibilities.
• Conduct incident post-mortems and retrospectives after every major incident and create incident close-out reports. Collaborate with incident managers and Development teams to identify action items and track them to closure.
• Facilitate root cause analysis to identify countermeasures to prevent similar incidents.
• Perform trend analysis on problems, root causes, countermeasures, and identify patterns and themes. Report out the analysis and recommendations to address those problem themes to management.
• Perform quality assurance on the completed incident, outage, problem investigations, and change management records.
• Proactively identify risks and issues; define and implement mitigation strategies
• Perform a range of work, sometimes complex and non-routine, in a variety of environments. Supports internal and external audit requests for Incident and Problem Management and related processes.
• Expected to share coverage for off-hour service restoration events on evenings and weekends, when required.
• While this role may begin with work as an individual contributor, future team additions may bring people management responsibilities.

Qualifications

About You:
• ITIL Foundation v3 certification
• Flexibility and willingness to work in a 24x7 operations environment.
• Advanced knowledge of incident, problem, and change management.
• Experience in Zendesk, Freshdesk, Service Now, Jira, or other CX and ITSM tools.
• 5+ years of relevant work experience, such as Business Continuity, Disaster Recovery, Incident Management, Risk, or Controls.
• Excellent management, interpersonal, communication, presentation, customer service, and organizational skills
• Ability to maintain calm during stressful situations; demonstrated leadership skills under fast-paced, highly dynamic situations.
• Ability to manage an incident/outage bridge with 20+ technical and business stakeholders
• Coordination skills: managing (complex) technical investigations.
• Familiarity with New Relic, SumoLogic, Datadog, CloudWatch, or other monitoring/logging tools used in the troubleshooting, identification, and resolution.
• Experience supporting Applications and Infrastructure in AWS preferred.

Preferred Skills & Qualifications
• Knowledge of industry-standard software best practices, development lifecycle processes, and Agile and SCRUM methodologies.
• Strong understanding of HTTP networking (protocol, cookies, authentication, cache, security, and performance).
• Experience in working in Cloud Support and Engineer organizations.
• Bachelor's Degree in Business Technology, Operations, Engineering, or related field.
• Knowledge in Domo, Tableau, or other data aggregation and reporting tools.

Vaccination Requirements

To help promote and ensure the safety of all of our employees and the communities we serve, we require all incoming employees to be fully vaccinated against the coronavirus. If hired, you will be required to provide proof of vaccination, which will be kept confidential. If you are not vaccinated, or partially vaccinated currently, but willing to become fully vaccinated, we invite you to still apply.

Audacy is committed to providing a safe and inclusive work environment for all and we recognize that some individuals may have a sincerely held religious belief as it pertains to the practice of immunization or a medical disability which may prevent them from becoming vaccinated. Individuals who cannot be fully vaccinated due to a sincere religious belief or medical disability may request an accommodation. We request that you refrain from discussing religious beliefs or medical conditions during the interview process. If you believe you need an accommodation, you will have an opportunity to submit your request during the interview process. All requests for accommodation will remain confidential and reviewed by selected individuals within the HR department. We cannot guarantee that all requests for accommodation will be granted. Please refer to our EEO policy and statement below.

Responsibilities:

What You'll Do:
- Creates and develops organizational policies, processes, and procedures for Incident and Problem Management. This may include an annual review and approval of policies by appropriate management.
- Responsible for managing escalations in the Incident and Problem Management processes, including providing reporting and feedback to department leaders. Uses discretion in identifying and responding to complex issues and assignments.
- Creates and provides periodic Management Reporting (KPIs & SLAs) relating to Incident and Problem Management, including goal metrics that are specific, measurable, attainable, realistic, and time-bound.
- Facilitates and coordinates the Incident and Problem Management related service restoration bridges and meetings to assure adherence to approved policies and procedures.
- Responsible for developing and delivering training and guidance for the Incident and Problem Management process with stakeholders, based on their roles and responsibilities.
- Conduct incident post-mortems and retrospectives after every major incident and create incident close-out reports. Collaborate with incident managers and Development teams to identify action items and track them to closure.
- Facilitate root cause analysis to identify countermeasures to prevent similar incidents.
- Perform trend analysis on problems, root causes, countermeasures, and identify patterns and themes.

[more...]

Jobcode: Reference SBJ-rb1vbo-3-14-133-148-42 in your application.