Senior SRE/ AWS/ Observability
Irvine, California
Onsite
Full Time
$150k - $180k
This company is internationally recognized for delivering high-quality networking solutions and smart home innovations. With a strong global presence spanning over 170 countries, they are dedicated to enhancing everyday life through faster, more dependable connectivity. Known for its customer-first approach and commitment to excellence, it continues to grow its influence in both residential and commercial markets.
They are currently seeking a Senior Site Reliability Engineer to join their team on-site at their Irvine location. This role offers the opportunity to work on mission-critical cloud and microservices infrastructure, focusing on system reliability, automation, and performance optimization. You will play a vital role in driving observability, improving scalability, ensuring compliance, and supporting global product deployments within a dynamic and collaborative technical environment.
Required Skills & Experience
The Offer
You will receive the following benefits:
Applicants must be currently authorized to work in the US on a full-time basis now and in the future.
#LI-AV3
They are currently seeking a Senior Site Reliability Engineer to join their team on-site at their Irvine location. This role offers the opportunity to work on mission-critical cloud and microservices infrastructure, focusing on system reliability, automation, and performance optimization. You will play a vital role in driving observability, improving scalability, ensuring compliance, and supporting global product deployments within a dynamic and collaborative technical environment.
Required Skills & Experience
- Bachelor's degree in Computer Science, Information Systems, or a similar technical discipline.
- A minimum of five years’ experience working in Site Reliability Engineering or a closely related field.
- Strong coding and scripting abilities using languages such as Java, Python, Bash, or PowerShell.
- Proven experience in SRE, DevOps practices, cloud platform management, and security implementation.
- Act as a technical authority in deploying and maintaining microservices within cloud-native Kubernetes environments.
- Conduct performance and resiliency testing (e.g., load and chaos testing) to validate system robustness under various conditions.
- Implement end-to-end observability across distributed services hosted on platforms such as AWS, Azure, Google Cloud, and Oracle Cloud.
- Coordinate disaster recovery strategies, ensuring readiness through close collaboration with infrastructure and application teams.
- Diagnose and mitigate operational issues stemming from system resource limitations, such as CPU/memory constraints or inefficient auto-scaling configurations.
- Develop automation tools and scripts using languages such as Python, Go, or Bash to enhance operational efficiency.
- Define service-level metrics (SLAs, SLOs, SLIs) in partnership with development teams to align technical performance with business expectations.
The Offer
You will receive the following benefits:
- Medical, Dental, and Vision Insurance
- 401K Retirement Savings Plan
- Free Snacks and Drinks, and Catered Lunch
- Free Gym Membership
Applicants must be currently authorized to work in the US on a full-time basis now and in the future.
#LI-AV3