Key Responsibilities :
- Develop ingestion pipelines (batch & stream) to move data to S3.
- Convert HiveQL to SparkSQL / PySpark.
- Orchestrate workflows using MWAA (Airflow).
- Build and manage Iceberg tables with proper partitioning and metadata.
- Perform job validation and implement unit testing.
Required Skills :
35 years of data engineering experience, with strong AWS expertise.Proficient in EMR (Spark), S3, PySpark, and SQL.Familiar with Cloudera / HDFS and legacy Hadoop pipelines.Knowledge of data lake / lakehouse architectures is a plus.Skills Required
S3, Pyspark, Hiveql, Sparksql