Job Description: The Systems Reliability Engineering (SRE) Senior Lead is a pivotal leader within our organization, responsible for ensuring the reliability, performance, and scalability of our critical systems. This role is instrumental in strategizing and overseeing reliability with an end-to-end service delivery perspective, aligning technical infrastructure with business objectives to meet evolving customer needs. As an influential figure in our company, the Systems Reliability Engineering Senior Lead will spearhead initiatives to automate infrastructure, enhance system observability, and drive the transformation of our IT operations.
What are we looking for?
1. Bachelor’s degree in Information Technology, Computer Science, Business Management, or a related field
2. 7+ years of experience in IT departments or a relevant field
3. 3+ years in a leadership, SRE, DevOps, or systems engineering role
4. A seasoned professional with a deep understanding of Site Reliability Engineering (SRE) principles, DevOps best practices, and cutting-edge technologies
5. Strong analytical, interpersonal, and organizational skills with a proven track record in issue and problem management in a multicultural and global environment
6. Proficiency with cloud platforms and experience in configuration management, scripting, and monitoring and observability tools
7. Understanding of business processes, change management, and ITSM processes, including service level management and reporting
8. Excellent communication skills and the ability to work collaboratively with cross-functional teams
What will be your key responsibilities?
The Systems Reliability Engineering Senior Lead is to ensure that the technology stack being deployed and its ability to be supported accordingly with the business requirements, the focus is in the infra tech stack and IT Operations support model:
System Reliability, Performance and Best Practices:
1. Design, implement, and maintain highly available and scalable systems
2. Monitor system performance, reliability, and security using advanced monitoring and logging tools
3. Proactively identify and resolve issues that could impact service availability
4. Conduct assessments to ensure systems comply with market standards and best practices
Automation and Infrastructure as Code (IaC):
1. Develop and maintain automated CI/CD pipelines to streamline deployments
2. Implement Infrastructure as Code (IaC) using tools like Terraform, Ansible, or others
3. Automate repetitive tasks to increase system efficiency and reliability
Collaboration and DevOps Culture:
1. Collaborate with software development teams to ensure new features are built with reliability in mind
2. Advocate for best practices in software engineering, deployment, and operations and foster a culture of collaboration and continuous improvement across teams
Capacity Planning and Scaling:
1. Conduct capacity planning to anticipate future growth and scaling needs
2. Implement strategies to efficiently scale systems based on demand
What can you expect from Mars?
1. Work with over 140,000 diverse and talented Associates, all guided by the Five Principles
2. Join a purpose-driven company, where we’re striving to build the world we want tomorrow, today
3. Best-in-class learning and development support from day one, including access to our in-house Mars University
4. An industry competitive salary and benefits package, including company bonus
#J-18808-Ljbffr