Job Description :
We are looking for a Big Data Engineer who will work on building, and managing Big Data Pipelines for us to deal with the huge structured data sets that we use as an input to accurately generate analytics at scale for our valued Customers. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company.
Core Responsibilities :
- Design, build, and maintain robust data pipelines (batch or streaming) that process and transform data from diverse sources.
- Ensure data quality, reliability, and availability across the pipeline lifecycle.
- Collaborate with product managers, architects, and engineering leads to define technical strategy.
- Participate in code reviews, testing, and deployment processes to maintain high standards.
- Own smaller components of the data platform or pipelines and take end-to- end responsibility.
- Continuously identify and resolve performance bottlenecks in data pipelines.
- Take initiatives, and show the drive to pick up new stuff proactively, and work as a Senior Individual contributor on the multiple products and features we have.
Required Qualifications :
5 to 7 years of experience in Big Data or data engineering roles.JVM based languages like Java or Scala are preferred.For someone having solid Big Data experience, Python would also be OK.Proven and demonstrated experience working with distributed Big Data tools and processing frameworks like Apache Spark or equivalent (for processing), Kafka or Flink (for streaming), and Airflow or equivalent (for orchestration).Familiarity with cloud platforms (e.g., AWS, GCP, or Azure), including services like S3, Glue, BigQuery, or EMR.Ability to write clean, efficient, and maintainable code.Good understanding of data structures, algorithms, and object-oriented programming.Tooling & Ecosystem :
Use of version control (e.g., Git) and CI / CD tools.Experience with data orchestration tools (Airflow, Dagster, etc.).Understanding of file formats like Parquet, Avro, ORC, and JSON.Basic exposure to containerization (Docker) or infrastructure-as-code (Terraform is a plus).(ref : hirist.tech)