You will take on the role of technical expert and a lead member of our Engineering team.
Designing and building data pipelines from data ingestion to consumption within a hybrid big data architecture, using Cloud Native AWS, NOSQL, SQL etc.
Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
Designing, building, and operationalizing large-scale enterprise data solutions using Hadoop based technologies along with AWS, Spark, Hive, & Data Lake, using Hive, PySpark and Python Programming.
Develop efficient software code for multiple use cases leveraging Python and Big Data technologies for various use cases built on the platform.
Responsible to Ingest data from files, streams and databases and process the data with Hadoop,Scala, SQL / NOSQL Database, Spark, ML, IoT.
Develop programs in Scala and Python as part of data cleaning and processing.
It includes data modelling, data ingestion, transformation, data consumption patterns and optimizing complex queries, creating efficient UDFs to extend the functionalities.
You will be involved in product feature development and will be working in close partnership with other engineering teams.
You will be responsible for mentoring other team members and ensuring high availability platform and stability for batch and stream processing systems.
Who you need to be ?
Experience - Min. 6 Years.
Bachelor's degree in computer science or computing-related discipline from premier institutes.
The candidate should have data processing ability (ETL techniques) using have scripting experience.
Native Data engineer with 3 to 8 years of hands-on experience with SQL and NOSQL databases such as HBase, Cassandra or MongoDB.
Extensive 2+ years experience working with Hadoop and related processing frameworks such as
Spark, Hive, Kafka etc.
Experience working with REST and SOAP based APIs to extract data for data pipelines.
Experience with cloud-native ETL languages / frameworks (Scala, Python, Databricks, AWS Glue).
Experience in working with Real time data streams and Kafka Platform.
Working knowledge with workflow orchestration tools like Apache Airflow design and deploy DAGs.