Are you interested in shaping the future of infrastructure, automation, and reliability at a Leading Fintech? We’re on the lookout for a Senior Site Reliability Engineer who thrives on tackling complex challenges, building scalable systems, and leading the charge in creating a world-class engineering ecosystem. This is your chance to join a global team dedicated to driving innovation and empowering our internal teams with cutting-edge solutions.
The Role:
* Architect and enhance the infrastructure and middleware to deliver top-tier performance, scalability, and reliability.
* Champion a culture of proactive monitoring, automation, and data-driven reliability, following SRE best practices.
* Collaborate with development teams to design resilient services and streamline continuous deployment frameworks.
* Dive into incident resolution, managing escalations, and working with cross-functional teams (and external partners) to keep systems running smoothly.
Experience/Skills:
* Deep knowledge of AWS services like EC2, S3, RDS, Lambda, Route 53, and more. You know your way around IAM and have experience with auto-scaling, security groups, and CloudFormation.
* Hands-on experience with Kubernetes and Docker, and a strong understanding of microservice architecture. You can configure, troubleshoot, and optimize containers like a champ.
* Fluency in Terraform, Ansible, or similar tools, with a knack for building immutable, automated infrastructure.
* Proficient in creating robust pipelines using tools like Jenkins, TeamCity, or Concourse.
* Comfortable with languages like Python, Golang, Bash, or PowerShell, and experienced in version-controlled environments (GitHub, Bitbucket).
* Solid understanding of routing, switching, DNS, firewalls, load balancing, and global traffic management.
* Familiarity with NoSQL/SQL databases, queuing systems (Kafka, SQS), and designing for high availability and clustering.
* Skilled with tools like ELK, Fluentd, or CloudWatch for performance tuning, forensic analysis, and capacity planning.
* Strong Linux and Windows administration, including storage and security best practices like SSH, TLS, and IPS/IDS.
#J-18808-Ljbffr