Job Description :
We are looking for a candidate with strong experience into Spark with Scala with Machine Learning and AI, Should have hands-on programming experience and Good Understanding of Data Structures, Algorithms, Data Transformation, Data Ingestion, Optimization mechanism / techniques, Good understanding of Big Data (Hadoop, MapReduce, Kafka, Cassandra) Technologies.
We deal with huge amounts of data at a massive scale, so we are looking for engineers who love solving challenging problems through conducting independent research and collaborating with teams across our product teams to help improve the overall product experience.
Key Responsibilities :
- Develop, optimize, and maintain large-scale distributed data processing systems using Apache Spark with Scala.
- Design and implement complex data transformation and ingestion pipelines for structured and unstructured data.
- Collaborate with data scientists and AI / ML engineers to integrate machine learning models into data workflows.
- Optimize Spark jobs for performance, scalability, and reliability.
- Work with large-scale data platforms involving Hadoop, Kafka, Cassandra, MapReduce, and related big data technologies.
Required Skills and Qualifications :
5 to 10 years of hands-on experience in data engineering with a focus on Spark and Scala.Strong knowledge of data structures, algorithms, and software engineering principles.Proficient in building data ingestion and ETL pipelines from diverse data sources.Experience with machine learning frameworks and integrating ML / AI models into data pipelines.Solid understanding of big data ecosystems including Hadoop, Kafka, Cassandra, and MapReduce.Proven experience with performance tuning, optimization techniques, and distributed systems.Familiarity with CI / CD pipelines, containerization (Docker), and orchestration tools (Kubernetes) is a plus.(ref : hirist.tech)