Develop and maintain scalable data pipelines using PySpark and Delta Lake, ensuring efficient processing of large, structured, and semi-structured datasets.
Analyse complex data sets using SQL, Python (Pandas, NumPy, scikit-learn, Seaborn) to identify trends, patterns, and opportunities for process improvement.
Continuously optimize the performance of data processing systems and maintain high levels of accuracy.
Design and build interactive dashboards and reports using tools like Metabase, Tableau, or Power BI.
Collaborate with management to prioritize business and information needs.
Requirements :
Bachelor's or Master's degree in Statistics, Mathematics, Engineering, Computer Science, or a related field.
2+ years of experience as a Business Analyst, with strong expertise in PySpark, Delta Lake, and SQL.
Solid understanding of data models, database design, and development.
Experience building incremental data pipelines and implementing watermark-based processing.
Proficiency with Python and libraries like Pandas, NumPy, scikit-learn, and Seaborn for data analysis and visualization.
Experience developing dashboards and interactive reports using Metabase, Tableau, or similar tools.
Strong analytical skills with a keen eye for detail, accuracy, and quality.