Site Reliability Engineer- Hybrid Reston, VA

Arlington, Virginia

Hybrid

Full Time

$100k - $150k

Site Reliability Engineer

The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support. This implies regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams. As part of the team, we expect the candidate to take ownership of the data platform, regularly interacting with the internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.

The company is located in Reston, VA and will be a hybrid model.

What You Will Be Doing:

Architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes
Automating cluster provisioning (CICD), scaling and monitoring using Ansible, Python and Jenkins
Participating in technical designs for software solutions that combine Open-Source, Commercial and custom developed components
Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry
Upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact
Troubleshooting complex issues in large and distributed environments.
Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments
Supporting data platform customers
Participating in the on-call rotation monitoring production systems and responding to incident

Required Skills & Experience:

Bachelor’s degree in computer science or a related technical field, or equivalent combination of education and experience
5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid)
Excellent understanding of Linux configuration and administration
Strong automation experience - Not just developing automation, but knowing why we automate and what to automate
Strong understanding of infrastructure-as-code such as Ansible
Experience with Docker or Kubernetes in a production environment
Strong written and verbal communication skills – able to clearly and succinctly describe complex issues

Applicants must be currently authorized to work in the United States on a full-time basis now and in the future.
This position doesn’t provide sponsorship.

Posted by: Ashton Corbett

Specialization:

DevOps

Site Reliability Engineer- Hybrid Reston, VA

Related Jobs

Infrastructure Engineer- Hybrid Reston, VA

Site Reliability Engineer- 5 days onsite New York, NY

DevOps Engineer/Systems Administrator- 5 days onsite Reston, VA

Cloud Engineer III- Hybrid Webster, MA

Senior DevOps Engineer- 100% Remote

Forward Deployment Engineer- Active TS/SCI Clearance

Detection Engineer- Active Secret Clearance 100% Remote

DevSecOps Engineer- AWS/Kubernetes/Docker/Security Engineering/Ansible

Lead Platform Engineer

Site Reliability Engineer

Senior Site Reliability Engineer (SRE)