Azure Site Reliability Engineer II / Sandy Springs, GA / Hybrid
Sandy Springs , Georgia
Onsite
Full Time
$145k - $150k
Exciting opportunity in Sandy Springs, GA! This company sells software for an e-commerce website focused in the retail industry. They are seeking an experienced Azure Site Reliability Engineer to join their team. This is an On-Site position and is a full-time role. In this role, you'll work with cutting-edge technologies such as Azure Services and Datadog!
Our client is looking for hard-working individuals who work well on a team. Here, you will have the chance to grow your skills, work on meaningful projects, and enjoy a supportive work-life balance. If you are ready to grow your skills, then this is the place for you!
Required Skills & Experience
• Proficiency with Azure services
• Strong experience with Datadog
• 5+ YOE with Site Reliability
• Proficient in scripting languages like Python, PowerShell, or Bash
• Strong skills in diagnosing, troubleshooting, and optimizing system performance issues across large-scale environments.
Desired Skills & Experience
• Knowledge of Datadog integrations for Azure services, Kubernetes, and CI/CD pipeline monitoring.
• Familiarity with managing and optimizing databases such as Azure SQL, Cosmos DB, or MySQL.
• Knowledge of SRE principles such as error budgets, automation, and incident postmortems.
• Familiarity with IaC (Terraform and Ansible)
• Understanding of compliance standards (ISO, SOC 2, GDPR) and security practices specific to cloud environments.
What You Will Be Doing
Tech Breakdown:
• Core services: Azure Kubernetes Service (AKS), Azure Functions, Azure App Services.
• Set up Datadog to monitor Azure resources, including Virtual Machines, AKS clusters, and storage accounts.
• Use Datadog’s dashboards and anomaly detection features to proactively detect and resolve system issues before they impact users.
• Monitor deployments through Datadog to detect any application errors or performance issues introduced during updates.
• Develop and optimize CI/CD pipelines for efficient, reliable application deployment.
Daily Responsibilities
• Automate resource provisioning and deployment with IaC tools like Terraform or ARM templates.
• Continuously monitor Azure infrastructure and applications using Datadog for performance, uptime, and resource utilization.
• Use Infrastructure as Code (IaC) tools like Terraform or Ansible to provision, update, and manage cloud infrastructure.
• Develop, maintain, and improve CI/CD pipelines to automate Docker image builds and Kubernetes deployments.
• Respond to system alerts, production issues, and incidents. Work to resolve outages quickly and perform root cause analysis to prevent future incidents.
The Offer
• Bonus OR Commission eligible
You will receive the following benefits:
• Medical, Dental, and Vision Insurance
• Vacation Time
• 401(k) with a company match, commuter benefits, paid holidays, PTO, quarterly bonuses, and more
• Health Insurance
Applicants must be currently authorized to work in the US on a full-time basis now and in the future.
Our client is looking for hard-working individuals who work well on a team. Here, you will have the chance to grow your skills, work on meaningful projects, and enjoy a supportive work-life balance. If you are ready to grow your skills, then this is the place for you!
Required Skills & Experience
• Proficiency with Azure services
• Strong experience with Datadog
• 5+ YOE with Site Reliability
• Proficient in scripting languages like Python, PowerShell, or Bash
• Strong skills in diagnosing, troubleshooting, and optimizing system performance issues across large-scale environments.
Desired Skills & Experience
• Knowledge of Datadog integrations for Azure services, Kubernetes, and CI/CD pipeline monitoring.
• Familiarity with managing and optimizing databases such as Azure SQL, Cosmos DB, or MySQL.
• Knowledge of SRE principles such as error budgets, automation, and incident postmortems.
• Familiarity with IaC (Terraform and Ansible)
• Understanding of compliance standards (ISO, SOC 2, GDPR) and security practices specific to cloud environments.
What You Will Be Doing
Tech Breakdown:
• Core services: Azure Kubernetes Service (AKS), Azure Functions, Azure App Services.
• Set up Datadog to monitor Azure resources, including Virtual Machines, AKS clusters, and storage accounts.
• Use Datadog’s dashboards and anomaly detection features to proactively detect and resolve system issues before they impact users.
• Monitor deployments through Datadog to detect any application errors or performance issues introduced during updates.
• Develop and optimize CI/CD pipelines for efficient, reliable application deployment.
Daily Responsibilities
• Automate resource provisioning and deployment with IaC tools like Terraform or ARM templates.
• Continuously monitor Azure infrastructure and applications using Datadog for performance, uptime, and resource utilization.
• Use Infrastructure as Code (IaC) tools like Terraform or Ansible to provision, update, and manage cloud infrastructure.
• Develop, maintain, and improve CI/CD pipelines to automate Docker image builds and Kubernetes deployments.
• Respond to system alerts, production issues, and incidents. Work to resolve outages quickly and perform root cause analysis to prevent future incidents.
The Offer
• Bonus OR Commission eligible
You will receive the following benefits:
• Medical, Dental, and Vision Insurance
• Vacation Time
• 401(k) with a company match, commuter benefits, paid holidays, PTO, quarterly bonuses, and more
• Health Insurance
Applicants must be currently authorized to work in the US on a full-time basis now and in the future.