Key Responsibilities :Data Pipeline Development :Design, develop, and implement scalable, efficient data pipelines to ingest, transform, and load data from various sources (e.g., databases, flat files, APIs, cloud services).Work with ETL tools (e.g., Apache Spark , Talend , Informatica ) to extract, transform, and load data into the organization's data warehouse or data lake.Optimize data pipelines for speed, scalability, and cost-efficiency, ensuring that they can handle large volumes of data.Data Architecture & Design :Collaborate with data architects and engineers to design and implement data storage solutions (e.g., data lakes , data warehouses , NoSQL databases ) that support business intelligence and data analytics.Create and maintain data models , schemas , and data dictionaries to ensure data consistency and integrity across different systems.Ensure the application of best practices for data governance , including data quality , metadata management , and data security .Data Integration :Integrate diverse datasets from internal and external sources (e.g., third-party APIs, cloud-based systems) into the data environment, ensuring compatibility and consistency.Develop and maintain automated data integration workflows , ensuring smooth and reliable movement of data between systems.Perform data cleansing and transformation to ensure high-quality data for reporting, analysis, and decision-making.Big Data & Cloud Solutions :Work with cloud data platforms (e.g., AWS Redshift , Azure Synapse Analytics , Google BigQuery ) to build scalable and efficient data storage solutions.Implement big data technologies such as Hadoop , Spark , and Kafka to handle large datasets and enable real-time data processing.Optimize and manage cloud-based data services for optimal performance, scalability, and cost-effectiveness.Collaboration & Cross-functional Support :Collaborate with data scientists, business analysts, and other stakeholders to understand their data requirements and deliver solutions that meet business needs.Provide technical support to data analysts and data scientists, ensuring they have the data and tools needed to perform their tasks efficiently.Work closely with stakeholders to translate business requirements into technical specifications and data solutions.Data Quality & Performance Monitoring :Monitor and maintain data pipeline performance, troubleshoot issues, and resolve any data quality issues to ensure high data accuracy and reliability.Implement monitoring and alerting solutions to track pipeline performance and data quality, ensuring proactive issue resolution.Conduct regular performance tuning and optimization of data systems and queries to enhance speed and efficiency.Documentation & Reporting :Create and maintain technical documentation for data pipelines, architecture, and processes, ensuring that they are clear, concise, and accessible to the team.Develop reports and dashboards to communicate data quality, pipeline status, and other key metrics to stakeholders and management.Ensure that all data work is thoroughly documented and adheres to compliance and data governance policies.Required Qualifications :Bachelor's degree in Computer Science , Engineering , Information Systems , or a related field.3-5 years of experience in data engineering , data architecture , or related fields.Proficiency in SQL , including the ability to write complex queries and optimize query performance.Experience with ETL tools such as Apache Spark , Talend , Informatica , or custom ETL frameworks.Strong knowledge of big data technologies like Hadoop , Kafka , Spark , or Flume .Familiarity with cloud platforms such as AWS , Azure , or Google Cloud Platform , particularly for data storage and processing solutions.Experience with data warehousing solutions such as Redshift , Snowflake , or BigQuery .Understanding of data integration , data modeling , and data governance best practices.Strong problem-solving skills with the ability to troubleshoot data issues and optimize systems for performance.Skills Required
Apache Spark, Talend, Informatica, Api, Data Security