Observability/Log Management Engineer | London
Are you ready to write your next chapter?
Make your mark at one of the biggest names in payments. With proven technology, we process the largest volume of payments in the world, driving the global economy every day. When you join Worldpay, you join a global community of experts and changemakers, working to reinvent an industry by constantly evolving how we work and making the way millions of people pay easier, every day.
What you'll own
Joining a team of system administrators and engineers responsible for designing, implementing and maintaining System and Cloud Observability & Log Management solutions which ensure that our infrastructure and applications are fully observable, enabling proactive monitoring, real-time analytics, and timely incident response. The team will play a critical role in developing strategies and implementing best practices in observability and log management for on-premises and cloud environments. Your responsibilities may include:
* Implement and manage observability tools such as Splunk, Zabbix, and similar platforms for infrastructure, applications, and cloud services.
* Set up and configure dashboards, alerts, and reports that provide visibility into system health, performance, and availability.
* Develop and maintain centralized logging solutions to ensure comprehensive logging coverage, log retention, and log security.
* Work with IT, DevOps, and product teams to define key performance indicators (KPIs) and service-level objectives (SLOs) for critical systems and applications.
* Provide support in monitoring and troubleshooting production systems, using observability tools to identify performance bottlenecks, anomalies, and incidents.
* Assist in automating monitoring tasks and creating self-healing scripts to enhance system reliability.
* Analyze logs and telemetry data to provide insights for incident detection, root cause analysis, and performance optimization.
* Participate in on-call rotations, responding to incidents and using observability tools for rapid diagnosis and resolution.
* Collaborate with security teams to ensure log management solutions support security monitoring and incident investigation.
* Continuously evaluate and recommend improvements to observability and log management practices, tools, and processes.
What you bring
* Several years of experience in IT Operations, with a focus on observability, and log management.
* Solid understanding of observability concepts, including metrics, log aggregation, log management, OpenTelemetry (OTEL) concepts and best practices, traces, event management and alerting.
* Hands-on experience with observability and monitoring tools (e.g., Splunk Enterprise, Splunk Cloud, Splunk Observability, OTEL agents, OTEL collectors, and OTEL gateways, Prometheus, Grafana, New Relic, Zabbix).
* Strong understanding of log management best practices, including centralized logging, data retention, and privacy requirements.
* Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and managing cloud-based monitoring solutions.
* Experience in designing and implementing system health dashboards, alerting mechanisms, and automated incident response processes.
* Experience developing Splunk queries and dashboards using Splunk Search Processing Language (SPL)
* Basic scripting skills (e.g., Python, Bash) for task automation and custom monitoring solutions.
* Strong problem-solving skills and the ability to work under pressure in a fast-paced environment.
* Excellent communication and collaboration skills, with the ability to work with cross-functional teams.
Added bonus if you have:
* A bachelor's degree or greater in computer science, information technology, or a related field. Practical experience in the role can be used in place of formal education.
* Knowledge of ITIL or similar frameworks for incident and problem management.
* Exposure to DevOps principles and experience with CI/CD pipelines.
* Experience in container monitoring (e.g., Kubernetes, Docker) and cloud-native architectures.
* Certifications:
o Technical certifications in cloud and virtualization technologies are highly valued. Any certifications for Splunk, AWS, Azure, MSCE, RH or VMware Certified Professional (VCP), VMware Certified Advanced Professional (VCAP), and Citrix Certified Associate - Virtualization (CCA-V), Datadog, Dynatrace or other observability tools.
What makes a Worldpayer
At Worldpay, we take our Values seriously, and we live them every day. Think like a customer, Act like an owner, and Win as a team.
* Curious. Humble. Creative. We ask the right questions, listening and learning to get better every day.
* Empowered. Accountable. Dynamic. We stay agile, using our initiative, taking calculated risks to progress.
* Determined. Inclusive. Open. Unlocking potential means working as one global community. We collaborate, always encouraging others to perform at their best.
Does this sound like you? Then you sound like a Worldpayer.
Apply now to write the next chapter in your career. We can't wait to hear from you. #J-18808-Ljbffr