About the Role :
We are seeking a highly skilled Data Engineer to design, build, and maintain scalable, high-performance data pipelines and systems that support advanced analytics and business intelligence.
You will work with cutting-edge big data technologies and frameworks to ingest, process, transform, and manage massive datasets.
The ideal candidate thrives in a fast-paced environment, is passionate about data innovation, and excels at bridging academic research with production-scale business solutions.
Key Responsibilities :
- Design, develop, and maintain robust, scalable big data pipelines using tools such as Apache Spark, Airflow, Bodo, Flume, Flink, and others for ingesting and processing terabytes of data efficiently.
- Architect and implement data warehouses and data marts utilizing technologies like Presto, Snowflake, Hadoop, and other cloud or on-premise data platforms.
- Build optimized data models and schemas to enable fast and reliable query performance.
- Develop and manage strategies for data ingestion, cleansing, transformation, and aggregation from heterogeneous sources including structured, semi-structured, and unstructured data.
- Understand and apply graph query languages such as GQL, Gremlin, Cypher for modeling and querying complex relationships.
- Build scalable graph-based system architectures to support business use cases involving relationship and network analysis.
- Apply graph data technologies for business impact, including data management, infrastructure optimization, budgeting, trade-offs, and workflow / project management.
- Write efficient and maintainable code in Python, Scala, or Rust, especially for supercomputing environments handling large-scale datasets (TB+).
- Develop quantitative analytics and business operation dashboards using tools like Tableau, Apache Superset, or similar visualization platforms.
- Stay abreast of developments in natural language processing (NLP) and large language models (LLMs) and incorporate these capabilities where relevant.
- Evaluate cutting-edge academic methods and prototype their application in production systems.
- Use advanced problem-solving skills combined with deep understanding of statistics, probability, algorithms, and mathematics to deliver innovative data solutions.
- Collaborate effectively with data scientists, software engineers, product managers, and business stakeholders to deliver high-impact projects.
- Operate efficiently in a dynamic, fast-moving development environment with changing priorities, tight deadlines, and limited resources.
- Demonstrate strong self-motivation and ability to work independently as well as within cross-functional teams.
Required Qualifications & Skills :
Bachelors or Masters degree in Computer Science, Engineering, Data Science, or related discipline.Extensive experience in building and managing big data pipelines using Apache Spark, Airflow, Flink, Kafka, or comparable frameworks.Strong expertise in data warehousing solutions and data modeling using platforms like Snowflake, Presto, Hadoop.Proficiency with graph databases and query languages (GQL, Gremlin, Cypher) is highly desirable.Advanced programming skills in Python, Scala, or Rust, with a focus on performance and scalability.Familiarity with data visualization tools such as Tableau, Superset, or equivalents.Experience with data mining, relational and NoSQL databases, and data automation practices.Understanding of natural language processing (NLP) and working with large language models is a plus.Strong foundation in probability, statistics, algorithms, and mathematical modeling.Excellent analytical, problem-solving, and communication skills(ref : hirist.tech)