Director of SRE
fort myers, Florida
Open to Remote
Full Time
$250k - $260k
A local company is seeking a Director of Site Reliability Engineering (SRE) to lead and enhance the Azure-based infrastructure in a fully remote role, with occasional office visits to Florida. This role is ideal for a seasoned SRE leader with deep expertise in Azure Cloud, Kubernetes, and observability tools.
Responsibilities- Architect, scale, and optimize Azure cloud environments to ensure reliability and performance.
- Lead Kubernetes operations, including cluster management and automation.
- Implement and manage Datadog and PagerDuty for monitoring, alerting, and incident response.
- Define and enforce SRE best practices to improve system resilience and operational efficiency.
- Collaborate with engineering teams to streamline CI/CD pipelines and infrastructure automation.
- Drive incident management, post-mortems, and reliability improvements.
- Proven experience leading SRE teams in an Azure-focused environment.
- Strong expertise in Kubernetes, including deployment, scaling, and troubleshooting.
- Hands-on experience setting up and managing Datadog and PagerDuty.
- Deep understanding of cloud infrastructure, automation, and observability tools.
- Experience with CI/CD, infrastructure as code (Terraform, Bicep), and scripting.
- Excellent problem-solving and leadership skills.