Summary
The Senior Data Engineer is responsible for building and optimizing ETL / ELT pipelines that process terabytes of data daily across 186 data assets, implementing BigQuery datasets with enterprise-scale performance optimization, and creating the data quality monitoring and transparency dashboards that enable data owner self-service.
Required Qualifications
Google Cloud Platform Data Engineering
- 5+ years of data engineering experience with at least 2+ years focused on Google Cloud Platform
- Strong proficiency with BigQuery including :
- Advanced SQL for analytical queries (window functions, CTEs, complex joins)
- Partitioning and clustering strategies for performance optimization
- Materialized views, authorized views, and query optimization techniques
- Cost optimization through efficient query design and storage management
- Understanding of BigQuery architecture (slot allocation, shuffle operations, distributed execution)
- Hands-on experience with Google Cloud Dataflow and Apache Beam :
- Pipeline development in Python or Java
- Batch and streaming data processing patterns
- Performance tuning and resource optimization
- Error handling and pipeline monitoring
- Proficiency with Cloud Composer (Apache Airflow) :
- DAG development and dependency management
- Airflow operators (BigQueryOperator, DataflowOperator, custom operators)
- Workflow orchestration for complex multi-step processes
- Monitoring and troubleshooting failed workflows
ETL / ELT & Data Integration
Strong experience building production-grade ETL / ELT pipelines processing terabyte-scale dataKnowledge of data integration patterns (full refresh, incremental load, change data capture)Experience with data transformation techniques (normalization, denormalization, aggregation)Understanding of data quality frameworks and validation strategiesProficiency with schema evolution handling changes without breaking downstream systemsExperience with data lineage tracking from source to consumptionSQL & Database Technologies
Expert-level SQL skills including advanced analytics functions and query optimizationUnderstanding of database performance tuning (indexing, partitioning, query plans)Experience with relational databases (PostgreSQL, MySQL, SQL Server) for source system integrationFamiliarity with NoSQL databases (Firestore, Bigtable) for specialized use casesKnowledge of data warehousing concepts (fact tables, dimension tables, slowly changing dimensions)Programming & Scripting
Strong Python proficiency for data pipeline development and scriptingExperience with Apache Beam SDK for Dataflow pipeline developmentProficiency with Pandas, NumPy for data manipulation and analysisUnderstanding of object-oriented programming and software engineering best practicesExperience with Git for version control and collaborative developmentBasic Shell scripting for automation and operational tasksData Security & Compliance
Understanding of row-level security implementation patterns in BigQueryExperience with PHI / PII data handling and healthcare compliance requirements (HIPAA preferred)Knowledge of data masking and de-identification techniquesUnderstanding of audit logging and compliance reporting requirementsFamiliarity with least privilege principles and data access controlsPreferred Qualifications
Google Cloud Professional Data Engineer certificationHealthcare industry experience with understanding of clinical and administrative dataExperience with Google Cloud Storage lifecycle policies and storage class optimizationKnowledge of Cloud Spanner for transactional workloadsFamiliarity with Cloud DLP API for automated data classificationExperience with dbt (data build tool) for analytics engineeringUnderstanding of data mesh or data fabric architectural patternsBackground in DevOps practices and CI / CD for data pipelinesExperience with Terraform for infrastructure as codeKnowledge of data visualization tools (Looker, Tableau, Power BI)Familiarity with machine learning workflows on GCP (Vertex AI)Experience with Docker and Kubernetes (GKE) for containerized workloads