As a Monitoring and Observability Specialist, you will be responsible for overseeing and optimizing our monitoring and observability infrastructure, including tools like Splunk and Dynatrace. Your role will be crucial in ensuring the reliability, performance, and security of our systems and applications, enabling proactive identification and resolution of issues.
Key Responsibilities:
1. Tool Implementation and Configuration:
- Implement and configure monitoring and observability tools such as Splunk, Dynatrace, and other relevant solutions to provide comprehensive visibility into the performance and health of our systems.
- Customize monitoring dashboards and alerts to meet the specific needs of different teams and applications.
2. Performance Monitoring:
- Monitor the performance of systems, applications, and infrastructure components in real-time to identify bottlenecks, anomalies, and areas for optimization.
- Conduct performance analysis and capacity planning to ensure scalability and efficiency of our IT environment.
3. Incident Response and Troubleshooting:
- Proactively identify and respond to incidents and service disruptions by analyzing monitoring data and alerts.
- Collaborate with cross-functional teams to troubleshoot and resolve issues in a timely manner, minimizing impact on business operations.
4. Security Monitoring:
- Monitor security events and logs to detect and respond to security threats and vulnerabilities.
- Work closely with the cybersecurity team to ensure compliance with security policies and standards.
5. Documentation and Reporting:
- Maintain comprehensive documentation of monitoring configurations, processes, and procedures.
- Generate regular reports on system performance, availability, and reliability for stakeholders and management.
Qualifications:
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- Proven experience in implementing and managing monitoring and observability tools in an enterprise environment.
- Strong proficiency in Splunk, Dynatrace, and other monitoring tools, including dashboard creation, query optimization, and alert configuration.
- Solid understanding of IT infrastructure components, including networks, servers, databases, and cloud platforms.
- Experience with scripting and automation for monitoring tasks (e.g., Python, PowerShell).
- Excellent analytical and problem-solving skills, with the ability to troubleshoot complex issues under pressure.
- Strong communication and collaboration skills, with the ability to work effectively across teams and departments.
Preferred Skills:
- Certifications in relevant monitoring and observability tools (e.g., Splunk Certified Admin, Dynatrace Associate).
- Experience with log management and analysis tools (e.g., ELK Stack, Sumo Logic).
- Knowledge of DevOps practices and tools for continuous integration and deployment.
#J-18808-Ljbffr