Motion Recruitment | Jobspring | Workbridge

Director, SRE & IT Operations

Boston, MA

Hybrid

Full Time

$130k - $200k

We are seeking an experienced and forward-thinking Director of Site Reliability Engineering (SRE) & IT Operations to lead our hybrid infrastructure, site reliability, and IT support initiatives. This role will oversee a growing team responsible for ensuring high availability, performance, and scalability of our cloud-native and on-prem systems — with a strong emphasis on AWS cloud infrastructure.

As a key technology leader, you will drive operational excellence, champion DevOps and SRE best practices, and help shape the roadmap for infrastructure, incident response, observability, and IT service management across the organization.

Key Responsibilities
  • Lead and scale the SRE and IT Operations teams, fostering a culture of collaboration, ownership, and continuous improvement.

  • Own and evolve the monitoring, alerting, and incident response processes to improve system reliability and reduce MTTR.

  • Oversee the management and optimization of our AWS-based infrastructure (EC2, EKS, RDS, S3, CloudWatch, etc.) ensuring performance, cost-efficiency, and security.

  • Partner with Engineering, Security, and Product teams to align infrastructure goals with business objectives.

  • Develop and maintain disaster recovery plans, system documentation, and runbooks.

  • Implement infrastructure as code (IaC), automation pipelines, and modern DevOps/SRE practices to streamline operations.

  • Manage vendor relationships and oversee procurement and deployment of IT systems and tools.

  • Ensure compliance with industry regulations and internal security policies.

  • Set strategic direction for tools, technologies, and operational processes supporting IT support services and infrastructure.

Qualifications

Required:

  • 10+ years of experience in IT Infrastructure, SRE, DevOps, or Cloud Operations, with at least 5 years in a leadership or director-level role.

  • Strong expertise in AWS cloud services and cloud-native architectures.

  • Proven experience implementing and managing observability platforms (e.g., Datadog, Prometheus, Grafana, ELK, etc.)

  • Experience with automation and configuration management tools (Terraform, Ansible, etc.)

  • Excellent leadership, communication, and stakeholder management skills.

  • Deep understanding of security, compliance, and ITIL/ITSM best practices.

Preferred:

  • Experience working in a hybrid environment with both cloud and on-prem systems.

  • Background in regulated industries (e.g., healthcare, finance, insurance) is a plus.

  • Familiarity with Kubernetes (especially EKS), CI/CD pipelines, and GitOps workflows.

Other Perks?
  • Impactful Work: Lead mission-critical infrastructure that powers the organization's growth.

  • Collaborative Culture: Work in a supportive and innovation-driven environment.

  • Competitive Benefits: Including health, dental, 401(k) match, generous PTO, and ongoing professional development.

  • Hybrid Flexibility: Maintain work-life balance with a flexible hybrid schedule.

Posted by: Michael Couhig

Specialization: