Site Reliability Engineer- Hybrid Reston, VA
Arlington, Virginia
Hybrid
Full Time
$100k - $150k
Site Reliability Engineer
The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support. This implies regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams. As part of the team, we expect the candidate to take ownership of the data platform, regularly interacting with the internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.
The company is located in Reston, VA and will be a hybrid model.
What You Will Be Doing:
This position doesn’t provide sponsorship.
The candidate will be involved in all aspects of the data platform, including ideation, design, implementation, deployment, customer onboarding and support. This implies regular cross-team collaboration with Data Engineering, Infrastructure, Engineering, Security, and Operation Teams. As part of the team, we expect the candidate to take ownership of the data platform, regularly interacting with the internal customers, proactively identifying, prioritizing, and delivering on their common data platform needs.
The company is located in Reston, VA and will be a hybrid model.
What You Will Be Doing:
- Architecting, deploying, and managing large-scale data platforms (Kafka, Spark, Hadoop, Druid) running on top of Kubernetes
- Automating cluster provisioning (CICD), scaling and monitoring using Ansible, Python and Jenkins
- Participating in technical designs for software solutions that combine Open-Source, Commercial and custom developed components
- Ensuring platform SLOs by collecting, visualizing, and alerting on relevant telemetry
- Upgrading large-scale data platforms improving system capabilities and security while ensuring minimal customer impact
- Troubleshooting complex issues in large and distributed environments.
- Staying up to date with the industry data platform best practices and standards, focusing on hybrid cloud environments
- Supporting data platform customers
- Participating in the on-call rotation monitoring production systems and responding to incident
- Bachelor’s degree in computer science or a related technical field, or equivalent combination of education and experience
- 5+ years of experience managing big data platforms (Hadoop, Spark, Kafka, Druid)
- Excellent understanding of Linux configuration and administration
Strong automation experience - Not just developing automation, but knowing why we automate and what to automate - Strong understanding of infrastructure-as-code such as Ansible
- Experience with Docker or Kubernetes in a production environment
- Strong written and verbal communication skills – able to clearly and succinctly describe complex issues
This position doesn’t provide sponsorship.