Social network you want to login/join with:
Senior Site Reliability Engineer (SRE), London
Client:
Preqin
Location:
London, United Kingdom
Job Category:
-
EU work permit required:
Yes
Job Reference:
102213cf0c20
Job Views:
89
Posted:
22.01.2025
Expiry Date:
08.03.2025
Job Description:
The Site Reliability Engineering (SRE) team at Preqin operates globally, supporting all of Preqin's services. Their responsibilities include designing, building, and maintaining infrastructure, middleware, and CI/CD systems to provide the best tools for internal teams. They blend problem-solving abilities with software and systems engineering to proactively construct fault-tolerant and secure systems, enhance observability, and aggressively automate tasks to reduce manual effort.
Employment Type: Permanent - Full Time
Location: London
Workplace type: Hybrid
What you’ll be doing:
* Use your site reliability expertise to design, operate and support Preqin’s infrastructure, middleware and internal services. Improving their performance, availability, scalability, latency and efficiency.
* Drive technical excellence in everything we do, fostering a culture of data-driven reliability, monitoring and automation, following SRE best-practices.
* Work alongside development teams to design and build scalable and high available services, while establishing effective build frameworks for continuous deployment and self-service automation.
* Work on incident resolution and engage with various teams (including 3rd parties) for support escalation.
What you’ll bring to us:
* You have previously worked with Amazon AWS cloud administration, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups.
* You possess expertise in containerisation within Kubernetes and Docker and are familiar with the pattern of Microservice Architecture. You can define container configuration and troubleshoot issues.
* You’re an expert with configuration management technologies such as Terraform and/or Ansible, as well as associated paradigms such as Infrastructure as Code and Immutable Infrastructure.
* You’re comfortable with building CI/CD pipelines in TeamCity/Jenkins/Concourse.
* You have good networking skills, including knowledge of routing & switching protocols as well as DNS, firewalling, load-balancing and global traffic management.
* You are familiar with, and able to install, configure and manage various persistence technologies, including database technologies (NoSQL/SQL) and broker/queuing systems (Kafka, SQS), including knowledge of HA/clustering.
* You are comfortable with various logging, monitoring and alerting platforms and have expertise in the usage (and, desirably, the deployment) of ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (SLI/SLO) and capacity planning.
* You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (LVM, RAID) and security best-practices SSH, SSL/TLS, HMAC, IPS/IDS.
#J-18808-Ljbffr