Lead Data Engineer - 100% remote
California
100% Remote
Full Time
$120k - $145k
A leading SaaS company driving innovation in the sustainability sector is seeking a Senior Data Engineer with expertise in Databricks and PySpark to design and scale data infrastructure. This role involves building high-performance ETL pipelines, optimizing large-scale data processing, and leading critical data initiatives. The ideal candidate has a strong Python background, excels in cloud-based data environments, and is comfortable taking ownership of projects from architecture to deployment.
Required Skills & Experience:- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
- 5+ years of experience in data engineering, working with big data solutions.
- Hands-on expertise in Databricks and PySpark for data transformation and processing.
- Strong proficiency in Python for data engineering and automation.
- Deep experience with cloud data platforms (AWS, Azure, or GCP).
- Expertise in SQL for data modeling, query optimization, and performance tuning.
- Experience building scalable ETL/ELT pipelines in Databricks and cloud environments.
- Ability to lead projects, collaborate cross-functionally, and drive technical decision-making.
- Familiarity with Terraform or Infrastructure-as-Code (IaC) for data infrastructure management.
- Knowledge of Delta Lake architecture and real-time streaming solutions.
- Experience in data governance, security, and compliance best practices.
- Exposure to machine learning pipelines and analytics platforms.
- Background in industries with complex data ecosystems, such as utilities, climate tech, or IoT.
Tech Breakdown:
- 50% Building & Optimizing Data Pipelines with PySpark & Databricks
- 30% ETL/ELT Pipeline Design & Cloud Data Engineering
- 20% Project Leadership & Collaboration
Daily Responsibilities:
- 60% Developing & Optimizing Scalable Data Pipelines
- 20% Leading Projects & Collaborating with Teams
- 20% Ensuring Data Quality, Governance, and Performance