JOB DESCRIPTION / REQUIREMENT
Role Overview :
We are looking for a Senior Data Engineer with 5+ years of experience in Big Data technologies . The ideal candidate will have strong hands-on experience in Spark , PySpark , and Databricks , and be capable of building scalable and reliable data pipelines. Knowledge of DevOps practices and containerization tools is a plus.
Primary Responsibilities :
- Develop and maintain scalable data pipelines for large-scale data processing.
- Translate business requirements into technical specifications and efficient ETL code.
- Work independently to develop and optimize Spark / PySpark pipelines.
- Write complex business logic using PySpark and Spark SQL.
- Understand and manage Spark clusters and parallel data processing.
- Develop unit tests for ETL components to ensure data integrity and quality.
- Build modular, reusable code functions to streamline development.
- Work with Athena : create tables, indexes, and write complex SQL queries.
- Integrate data from relational databases (Oracle, SQL Server) and NoSQL databases (e.g., MongoDB).
Technical Skills Required (Primary) :
Apache Spark , PySpark , Spark SQLDatabricksPython (for Data Engineering use cases)Relational Databases : Oracle, SQL ServerNoSQL : MongoDBBig Data Architecture and scalable pipeline designJava or Scala (good to have)Secondary / Desirable Skills :
AWS IAM : Role and policy creationDocker : Image creation and container managementCamunda : Workflow orchestration and process managementKubernetes : Container orchestration (good to have)