Minimum Years of Experience 3-5 years in Python and SQL programming, ETL 2+ years in Spark, data visualization Duties and Responsibilities : Data processing : data querying, analysis and validation Build, optimize and maintain data flows within data warehouses, storage and cloud environments. Data Modeling : Design scalable data pipeline schemas and architecture. Deploy and monitor data pipelines in production.
Requirements
Basic Qualification : Bachelor's degree in computer science, Information Systems, Engineering, or a related field Proficient in Python, SQL, and Spark / PySpark. Design, develop, and maintain efficient and scalable data pipelines for both batch and real-time processing. Build and manage data integration workflows between internal and external data sources. Implement and optimize data models, schemas, and storage structures for analytics and reporting. Collaborate with data scientists, analysts, and business teams to ensure data availability, quality, and reliability. Develop and enforce data quality, governance, and security best practices. Automate data ingestion, transformation, and validation processes. Deploy and monitor data pipelines in cloud environments (Azure, GCP). Support data infrastructure scalability, performance tuning, and cost optimization. Document data flow processes, architecture, and best practices.
Preferred Qualification : Knowledge in Machine Learning and AI engineering. MLOPs in Databricks
Skills Required
Machine Learning, Pyspark, Sql, MLops, Gcp, Spark, Data Visualization, Databricks, Azure, Python, Etl
Data Engineer Python Sql • Chennai, India