Role : Data Platform Engineer
Location : Mumbai (Hybrid)
Contract Duration : 12+months, extendable
Notice period : Immediate Joiners only
Strong Experience in PySpark
- Hands-on expertise in building scalable data pipelines using PySpark.
- Proficiency in using Spark SQL, DataFrame, and RDD APIs to implement complex business logic.
Proficient Programming Skills
Solid coding skills in Python (preferred), with strong fundamentals in data structures, algorithms, and software engineering principles.Data Pipeline Development
Proven experience designing, developing, and maintaining batch and streaming data pipelines.Understanding of ETL / ELT processes and best practices for data transformation, data quality, and performance optimization.Knowledge of Modern Data Engineering Ecosystem
Familiarity with the current data engineering landscape including distributed data processing, storage systems, and workflow orchestration tools.Tools / technologies could include Apache Airflow, dbt, Delta Lake, etc.Cloud Data Platform Experience (Preferably AWS)
Experience working with cloud services such as :AWS S3 (data lake)AWS Glue / EMR (for Spark workloads)AWS Lambda , Step Functions , or similar for orchestration / integrationRedshift or other cloud data warehousesSpark API Expertise for Business Logic ImplementationAbility to choose and apply the right Spark APIs (DataFrame, Dataset, RDD) for performance-efficient implementation of business logic at scale.