The Solution, Reliability and Monitoring departments main objective is to define, provide and support the production environments used by all NAVBLUE’s customers.
As part of this team the Site Reliability Specialist purpose is to ensure that the Production Infrastructure is up-and-running 24x7 troubleshooting the issues encountered by the infrastructure and providing continuous improvement to our ways-of-working.
The Site Reliability Specialist is also interfacing with our Contractor who is managing the L1 monitoring and troubleshooting of our solution on a daily basis.
The Site Reliability Specialist will need to have a solid knowledge of Operating Systems, Virtualization, Networking and Cloud based infrastructure, such as Amazon Web Services (AWS).
This position is expected to use and create established procedures to analyze and resolve problems. This position is expected to perform moderately complex and varied tasks with minimal supervision.
Responsibilities:
* Contractor activities definition and follow-up:
o Create / review / update the Standard Operating Procedures (SOPs) for our Contractor,
o Train our contractor for new SOP, monitoring or new components onboarding,
o Follow-up the Contractor daily activities - report any failure through RCA / improve SOPs / review good/bad escalations and report,
o Support escalations.
* Support customer Disaster Recovery testing activities periodically.
* Hosting operation process documentation and improvements.
* Triage the tickets reaching the team from Monitoring, Customers or Projects.
* Support activities:
o Revert software deployments in case of issue on a software after a deployment.
o Answer to escalations fixing immediate issues critical for customers
* Monitoring tool management:
o Adjust thresholds and alert levels,
o Measure system performance / responsiveness,
o Add Checks/Hosts in the different system,
o Associate the SOPs to the checks created.
* Perform deployments and support the associated failures.
* Automate activities related to daily operations and deployments.
* Report on main events / activities - provide summary to management about our system status.
* Ensure the support continuity between the different timezones.
* Contribute to Service Level Objectives (SLO) definitions & monitoring.
* Internal release validation including upgrades & rollbacks.
* Internal process exercises such as:
o Disaster Recovery / Failover processes
o Backup / Restore validation
* COTS installation and management.
Experience:
* Solid experience (2-4 years) in a technical support team or in a similar technical support environment.
* Some experience in scripting and automation an asset.
* Some Experience in Cloud based infrastructure preferred.
Licensure/Certifications:
* AWS Certifications “Cloud Practitioner” and “Solution Architect – Associate” preferred.
* Other certifications an asset.
Knowledge, Skills, Demonstrated Capabilities & Competencies:
* Solid knowledge of Operating Systems & ability to perform troubleshooting required.
* Solid knowledge of Cloud Technology concepts & ability to perform troubleshooting required.
* Solid knowledge of networking for enterprise environments required.
* Solid knowledge of Virtual Machine concepts and management of infrastructure.
* Demonstrated ability to identify root cause of issues and to recommend permanent, long term, fixes.
* Demonstrated ability to perform standard troubleshooting in AWS environment and providing guidance to other teams.
* Proactive, confident self-starter with effective interpersonal and communication skills.
Technical Systems Proficiency:
* Advanced Proficiency with operation and support of Linux or MS Windows Operating Systems.
* Excellent working knowledge of Linux Core (Kernel, Modules & Dependencies).
* Excellent working knowledge of Windows domain administration, patch management, IIS management and Group Policies.
* Working knowledge of networking, shell scripting, MySQL, MS SQL, DNS, XML, Perl, and Palo Alto firewalls an asset.
* Cloud environments, such as AWS.
We offer:
* Stable employment based on a full-time job contract.
* International working environment in a dynamic company.
* Access to the latest knowledge and technologies enabling professional development.
* Training and development possibilities.
* Participating in international projects and international trips.
* Competitive salary dependent on experience and qualifications.
* Flexible working hours and work-from-home opportunities.
* Private medical coverage for you and your family.
* Sport card.
* Life insurance for you and your family.
* Co-funding for meals.
* Employee stock ownership plan.
How to Apply:
We thank all applicants for applying. Only selected applicants will be contacted.
Navblue is committed to creating an environment and a culture where everyone feels like they belong no matter who they are or where they are from. We are committed to providing equal employment opportunities to all individuals based on job-related qualifications and ability to perform a job. We do not discriminate against any employee or applicant for employment because of race, colour, sex, age, national or ethnic origin, religion, sexual orientation, gender identity or expression, marital status, family status, genetic characteristics, record of offences, and basis of disability or any protected class. Accommodations will be available on request for candidates throughout the entire recruitment and selection process.
#J-18808-Ljbffr