Description
Key Responsibilities :
- Data Pipeline Development : Design, build, and optimize scalable, secure, and reliable data pipelines to ingest, process, and transform large volumes of structured and unstructured data.
- Data Architecture : Architect and maintain data storage solutions, including data lakes, data warehouses, and databases, ensuring performance, scalability, and cost-efficiency.
- Data Integration : Integrate data from diverse sources, including APIs, third-party systems, and streaming platforms, ensuring data quality and consistency.
- Performance Optimization : Monitor and optimize data systems for performance, scalability, and cost, implementing best practices for partitioning, indexing, and caching.
- Collaboration : Work closely with data scientists, analysts, and software engineers to understand data needs and deliver solutions that enable advanced analytics, machine learning, and reporting.
- Data Governance : Implement data governance policies, ensuring compliance with data security, privacy regulations (e.g., GDPR, CCPA), and internal standards.
- Automation : Develop automated processes for data ingestion, transformation, and validation to improve efficiency and reduce manual intervention.
- Mentorship : Guide and mentor junior data engineers, fostering a culture of technical excellence and continuous learning.
- Troubleshooting : Diagnose and resolve complex data-related issues, ensuring high availability and reliability of data systems.
Required Qualifications
Education : Bachelors or Masters degree in Computer Science, Engineering, Data Science, or a related field.Experience : 3+ years of experience in data engineering or a related role, with a proven track record of building scalable data pipelines and infrastructure.Technical Skills
Proficiency in programming languages such as PythonExpertise in SQL and experience with NoSQL databases (e.g., MongoDB, Cassandra).Strong experience with cloud platforms (e.g., AWS, GCP) and their data services (e.g., Redshift, BigQuery, Snowflake).Hands-on experience with ETL / ELT tools (e.g., Apache Airflow, Talend, Informatica) and data integration frameworks.Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka) and distributed systems.Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes) is a plus.Soft Skills
Excellent problem-solving and analytical skills.Strong communication and collaboration abilities.Ability to work in a fast-paced, dynamic environment and manage multiple priorities.Certifications (optional but preferred) : Cloud certifications (e.g., AWS Certified Data Analytics, Google Professional Data Engineer) or relevant data engineering certifications.(ref : hirist.tech)
Skills Required
BigQuery, Hadoop, Kafka, Redshift, Sql, Gcp, Docker, snowflake , Spark, Kubernetes, Python, Aws