Job Description
Site Reliability Specialist (SRE)
In this role, you will ensure that the tools and infrastructures used for the studio's various activities are working properly. More specifically, you will guarantee that they are viable, stable, durable, and efficient.
A true chameleon, you will use your technical expertise and observation skills to manage the systems associated with different areas. In addition, you will act as a contact person to resolve and prevent incidents that may occur in these areas.
With agility, you will move from the technical to the teaching side to facilitate the bridge between development and operations. You will teach these teams the best practices for testing, development, validation, and automation. This will result in the delivery of stable, quality products in a timely manner.
What you'll do
• Guide development teams in choosing technologies to improve the visibility, control, and robustness of systems.
• Automate processes as much as possible to make everyone's job easier.
• Implement tools and work methods to facilitate the safe and controlled deployment of services as well as set up and improve incident management processes.
• Participate in diagnosing and correcting anomalies and failures related to tools and infrastructures.
• Coordinate resources to restore and ensure service level objectives.
• Design, deploy, secure, and maintain various reliable environments.
• Provide ongoing technical support and proactively address and resolve issues.
• Create and maintain deployment guides and document infrastructure implementation and technical specifications in addition to problems encountered and their solutions for future sharing.
Qualifications
What you bring
• An undergraduate degree in computer science, computer engineering, or an equivalent field
• Extensive experience in software development, system administration, and database administration (or other relevant experience)
• Experience in infrastructure automation (cloud and on-premises)
• In-depth knowledge of programming languages; observability technologies (Grafana, Splunk, Elasticsearch, Prometheus, OpenTelemetry, etc.,); development tools (Docker, GitHub, Terraform); CI/CD processes; cloud services; and network infrastructures
• Experience with configuration management software (Ansible, Chef, etc.) and system administration (Linux and Windows)
• Understanding of redundant and scalable architecture design, analysis and debugging; code optimization; and routine task automation
• The ability to work and adapt in a fast-paced environment
• Excellent interpersonal and communication skills coupled with a strong sense of teamwork
• A solution-oriented mindset that is capable of analysis and synthesis
• An insatiable thirst for knowledge that keeps you up to date on technological advances
What to send our way
• Your CV highlighting your education, experience, and skills
Jobcode: Reference SBJ-r7pqj8-3-128-31-106-42 in your application.