Senior SRE
Glendale, California
Full Time
$170k - $200k
Job Description
A large entertainment company is looking to hire a Senior SRE! This will be a hybrid schedule with 3 days on site in Glendale, CA.
The Systems Reliability Engineering (SRE) team helps build and deliver the robust software platforms that support immersive experiences. SREs take a software engineering-driven approach to designing, automating, scaling, and maintaining applications and infrastructure. This includes building and operating systems in close alignment with business needs, using resilient, scalable, and efficient architecture patterns. These engineers are passionate about quality, driven by data, and committed to continuous improvement through instrumentation, automation, and comprehensive testing strategies.
Required Skills & Experience-
Bachelor’s degree in Computer Science or a related field, with 7+ years of industry experience
-
Deep expertise in Linux and Windows systems administration
-
Hands-on experience with CI/CD tools (e.g., GitHub Actions, GitLab CI), automation frameworks (e.g., Terraform, Ansible, Chef), and scripting/development in Python, Ruby, Go
-
Proficient in cloud automation using tools like Boto, CloudFormation, and Terraform
-
Strong understanding of source control, cloud platforms (AWS, Google Cloud, Azure), containers, and DevOps best practices
-
Skilled in system performance optimization, monitoring, capacity planning, and documentation
-
Ability to collaborate with cross-functional technical teams to plan, design, troubleshoot, and support high-impact systems
-
Experience installing and configuring both Linux and Windows Server operating systems
-
Familiar with container platforms such as Docker, ECS, Kubernetes
-
Proficient in Infrastructure-as-Code practices using tools like Terraform, CloudFormation, Bicep, or Pulumi
-
Solid knowledge of source control systems (Git) and CI pipelines (GitLab CI, GitHub Actions)
-
Recognized expertise in at least one OS and capable across several platforms, including performance tuning, configuration, and diagnostics
-
Skilled in managing and troubleshooting web technologies: Java, Node.js, Tomcat, IIS, Apache, nginx, MySQL, PostgreSQL, etc.
-
Strong grasp of networking fundamentals: HTTP, load balancing, DNS, SNMP, REST, VIPs, and security zones
-
Ability to apply and implement platform standards around:
-
Site and application monitoring
-
Infrastructure resilience and reliability
-
System and performance instrumentation
-
-
Experience scripting and developing tools in Bash, Python, Ruby, or Go to automate operational tasks
-
Proficient in at least one modern programming language (Python, PHP, Ruby, Java, Go, Swift, or C++) and capable of writing unit tests
-
Experience supporting or building backend tools/services
-
Able to analyze and interpret results from load testing in moderately complex systems
-
Strong diagnostic and troubleshooting abilities; can guide others in these areas
-
Proven ability to resolve complex incidents independently and proactively propose system or application improvements
-
Capable of evaluating infrastructure solutions for scalability, performance, and feasibility
-
Comfortable navigating and adapting to evolving technologies and workflows
-
Excellent communication skills across all organizational levels
-
Acts as a primary technical contact for leadership
-
Demonstrates a passion for learning and staying current with emerging technologies
-
Master’s degree in Computer Science or related discipline, or equivalent professional experience in systems engineering and technical operations
Daily Responsibilities
-
Design, build, and maintain automated deployment pipelines, cloud environments, and infrastructure that support immersive guest experiences in both physical and digital spaces
-
Thrive in a fast-paced, highly collaborative environment with multiple stakeholders and teams
Key Responsibilities
- Lead architecture discussions, contribute to system planning, and collaborate with stakeholders to define engineering solutions
- Develop, integrate, and configure tools, systems, and cloud infrastructure to support operational needs