Design, develop, and maintain scalable and efficient data pipelines using PySpark and DatabricksPerform data extraction, transformation, and loading (ETL) from diverse structured and unstructured data sourcesDevelop and maintain data models, data warehouses, and data marts in DatabricksProficient in Python, Apache Spark, and PySparkIntegrate third party data from multiple sources with internal dataWrite and optimize complex SQL queries for high performance and scalability across large datasetsCollaborate closely with data scientists, analysts, and business stakeholders to gather and understand data requirements.Ensure data quality, consistency, and integrity throughout the data lifecycle using validation and monitoring techniques.Develop and maintain modular, reusable, and well-documented code and technical documentation for data workflows and processes.Implement data governance, security, and compliance best practices.Skills Required
Pyspark, Apache Spark, Databricks, Python, Sql