Overview How You'll Make an Impact A subsidiary of Publicis Groupe, Epsilon is a leading provider of multi-channel marketing services, technologies, and database solutions. We do more than collect and store data, and we might be the most important Internet company you've never heard of. Join our team for your chance to work in the digital marketing space and solve meaningful problems on a massive scale-and have fun doing it. The System and Platform Operations Manager is a technical leadership role that is responsible for the support, reliability and stability of Epsilon Retail Media production systems, environments and offerings. The team owns the reliability vision for the company, driving continuous improvement through a combination of development and operations initiatives as well as process excellence. This position and their team has solid-line responsibility for operations including the deployment, management, monitoring, reporting, troubleshooting, and repair of production systems. Core to the success of the role is to provide a premium customer support experience focused on a "center of excellence" that allows for a full-service delivery support cycle. This role is responsible for managing the Platform Operation Team centralized within a single geo-region, orchestrating the regional teamwork, serving with both technical and professional support, and championing the company values. The Platform Operations Engineer works closely with the Engineering team to ensure ongoing system stability and supports the Technical Account Managers from an environment's perspective. The Platform Operations team is responsible for supporting all retailers once they are live. Critically important is how this team collaborates and liaises with other teams such as Customer Support, Technical Account Management, Engineering and Customer Success teams. What you'll do: Operational Practices Establish and manage operational practices and ensure we design, implement and operate a support model that is fit for purpose for our future. Implement proactive solutions for incident and problem detection, response and remediation and continuous improvement Owner of the operational integrity of all production environments. Production Monitoring and Operational Reporting Adopt a "Measure Everything" approach to ensure that internal service level objectives and customer service levels agreements are exceeded including executive level reporting on operational health metrics such as SLAs, incident resolution, performance, availability, reliability, capacity etc. Customer Support & Incident Management Own incident management processes and on call response. Take ownership of complex issues related to performance, reliability, and scalability and leading resolution of serious incidents and events including communications with customers and wider stakeholders. Change Management Uphold processes and procedures to manage change across production platforms Provide insight and expertise on how customers will perceive the changes or impacts to customers to drive customer organization change management and communication. Empower the Delivery teams to release new products, features, updates and fixes quickly, while ensuring Platforms remain reliable and stable. System Reliability Work with the wider Engineering, Product, Delivery and Security teams to ensure that appropriate attention is given to production/system reliability. Establish Operational Practices in conjunction with the Product and Engineering teams (e.g. understanding how product feature development could affect the system's overall reliability and performance). Provide delivery status information on System Reliability initiatives to the IT Leadership Team and additional stakeholders with a focus and ensure proper communication concerning changes to agreed milestones or challenges, risks and blockers that may affect the outcome or agreed completion dates (with proactive suggestions to resolve) IT Service Management Execute Service Management processes including Change, Config, Service Level, Performance, Incident and Problem Management to deliver a high level of support and system availability Leverage industry standards and best practices for improving service levels and performance Uphold Customer Support standards in line with Service Level Agreements Ensure SLAs and KPIs are met to the best of your ability, with particular focus on first level response times, escalation paths and resolution times. Uphold the IT Service and Support workflow - with a particular focus on ensuring best in class customer experience. Deliver support and service solutions for the Group in line with industry best practice Work as a team to ensure all SLAs and practices are well defined, documented and consistently applied/adhered to provide premium customer support services. Organizational Capability Identify the capabilities needed to meet the current and emerging business needs of a significant function. Evaluate current capabilities, identify gaps, and prioritize development activities. Embed personal development and the fulfillment of personal potential in the culture of the organization. Build capabilities elsewhere in the organization through mentoring and other informal methods. Technical Developments, Process Improvement and Simplification Discuss and recommend more complex or innovative technical developments to improve the quality of software and supporting infrastructure to better meet users' needs. As subject matter expert on the team, maintain understanding of current technology, database management, reliability practices, and future trends through ongoing education, conference attendance and industry press. Ensure all processes and procedures are documented for ease of continuous improvement activities Proactively identify new opportunities to drive improvements and simplification of our overall technology solutions. Personal Capability Building Develop own capabilities by participating in assessment and development planning activities as well as formal and informal training and coaching; gain or maintain external professional accreditation where relevant to improve performance and fulfill personal potential. Maintain an in-depth understanding of technology, external regulation, and industry best practices through ongoing education, attending conferences, and reading specialist media. Who You Are What you'll bring with you: At least 5 years of experience of hands-on experience in Site Reliability focused positions. Strong knowledge of containerization technologies (Docker, Kubernetes). Experience with infrastructure as code (Terraform). Solid understanding of networking, security, and system architecture. Proficient in scripting languages (Java, Golang, Python, Bash, or similar). Experience with monitoring and observability tools (DataDog, Prometheus, Grafana). Knowledge of database management systems (PostgreSQL, Bigtable). Understanding of API and microservices architecture. Strong people leadership skills with at least a year in leading and driving high-performance technical teams Operations teams within enterprise environments with knowledge of DevOps, ITIL, Cloud Services, IT Infrastructure and Operations supporting and maintaining production and development environments and building cloud services that are secure, reliable, scalable and observable Experience implementing and managing Logging, Monitoring and Alerting frameworks Knowledge and experience of establishing deployment and automation pipelines Expertise with ITSM principles from previous positions held. Have excellent communications and written skills, and must be able to talk about technology intelligently and passionately to all levels of an organization including Developers, Architects and senior management (technical and non-technical) Past establishing support strategies to support SaaS or Cloud based backends with a particular focus on APM deployment (such as Dynatrace or other monitoring tools). Experience with establishing Service Delivery strategies that align to new ways of work methods, including Agile. Understanding of international requirements relating to data/information security. Experience in the design, development and management of commercial technology contracts, technical service level agreements, and KPIs. Experience of establishing and delivering IT support services in a high availability (HA) environment such as 24/7 operations. Why you might stand out from other talent: Google Cloud Architect or Engineer certification preferred. Achieved certificates in relevant Database Managements Systems, referenced programming languages/scripting tools, or similarly related subject matter. Bachelor's degree or equivalent. Additional Information When You Join Us, We'll Create Something EPIC Together Epsilon is a global data, technology and services company that powers the marketing and advertising ecosystem. For decades, we've provided marketers from the world's leading brands the data, technology and services they need to engage consumers with 1 View, 1 Vision and 1 Voice. 1 View of their universe of potential buyers. 1 Vision for engaging each individual. And 1 Voice to harmonize engagement across paid, owned and earned channels. Epsilon's comprehensive portfolio of capabilities across our suite of digital media, messaging and loyalty solutions bridge the divide between marketing and advertising technology. We process 400 billion consumer actions each day using advanced AI and hold many patents of proprietary technology, including real-time modeling languages and consumer privacy advancements. Thanks to the work of every employee, Epsilon has been consistently recognized as industry-leading by Forrester, Adweek and the MRC. Epsilon is a global company with more than 9,000 employees around the world. Epsilon has a core set of 5 values that define our culture and guide us to create value for our clients, our people and consumers. We are seeking candidates that align with our company values, demonstrate them and make them meaningful in their day-to-day work: Act with integrity. We are transparent and have the courage to do the right thing. Work together to win together. We believe collaboration is the catalyst that unlocks our full potential. Innovate with purpose. We shape the market with big ideas that drive big outcomes. Respect all voices. We embrace differences and foster a culture of connection and belonging. Empower with accountability. We trust each other to own and deliver on common goals. Because You Matter We know that we have some of the brightest and most talented employees in the world, and we believe in rewarding them accordingly. If you work here, expect competitive compensation, a great benefits package and endless opportunities to advance your career. We offer hybrid working opportunities, with our office space located in the Iconic Television Centre, White City. As part of our dedication to enhance our inclusive and diverse workforce, Epsilon is committed to equal access to opportunity for people without regard to race, age, sex, disability, neurodiversity, sexual orientation, gender identity, pregnancy and maternity, marriage and civil partnership or religion or belief. We are committed to providing reasonable adjustments for candidates in our application process.