Role : Data Engineer
Location : Mumbai, 5th Floor, 3, North Avenue, Maker Maxity, Bandra Kurla Complex,
Contract : 12+months, likely long term
Work Mode : Hybrid
Notice Period : Immediate - 15days
- Strong Experience in PySpark
- Hands-on expertise in building scalable data pipelines using PySpark.
- Proficiency in using Spark SQL, DataFrame, and RDD APIs to implement complex business logic.
- Proficient Programming Skills
- Solid coding skills in Python (preferred), with strong fundamentals in data structures, algorithms, and software engineering principles.
- Data Pipeline Development
- Proven experience designing, developing, and maintaining batch and streaming data pipelines.
- Understanding of ETL / ELT processes and best practices for data transformation, data quality, and performance optimization.
- Knowledge of Modern Data Engineering Ecosystem
- Familiarity with the current data engineering landscape including distributed data processing, storage systems, and workflow orchestration tools.
- Tools / technologies could include Apache Airflow, dbt, Delta Lake, etc.
- Cloud Data Platform Experience (Preferably AWS)
- Experience working with cloud services such as :
- AWS S3 (data lake)
- AWS Glue / EMR (for Spark workloads)
- AWS Lambda , Step Functions , or similar for orchestration / integration
- Redshift or other cloud data warehouses
- Spark API Expertise for Business Logic Implementation
- Ability to choose and apply the right Spark APIs (DataFrame, Dataset, RDD) for performance-efficient implementation of business logic at scale.
Show more
Show less
Skills Required
Spark SQL, Aws Lambda, Pyspark, AWS Glue, Redshift, ELT, Apache Airflow, dbt, Python, Aws S3, Etl, Aws