Talent.com
This job offer is not available in your country.
Data Engineer - Python / Spark

Data Engineer - Python / Spark

O2F Info Solutions Pvt. Ltd.Chennai
30+ days ago
Job description

Job Summary :

We are seeking a highly skilled Senior Data Engineer with 4 to 8 years of experience in building robust data pipelines and working extensively with PySpark to join our data engineering team.

Key Responsibilities :

Data Pipeline Development :

  • Design, build, and maintain scalable data pipelines using PySpark to process large datasets and support data-driven applications and analytics.

ETL Process Automation :

  • Develop and automate ETL (Extract, Transform, Load) processes using PySpark, ensuring efficient data processing, transformation, and loading from diverse sources into data lakes, warehouses, or databases.
  • Distributed Computing with PySpark :

  • Leverage Apache Spark and PySpark to process large-scale data in a distributed computing environment, optimizing for performance and scalability.
  • Cloud Data Solutions :

  • Develop and deploy data pipelines and processing frameworks on cloud platforms (AWS, Azure, GCP) using native tools like AWS Glue, Azure Databricks, or Google Dataproc.
  • Data Integration & Transformation :

  • Integrate data from various internal and external sources, ensuring data consistency, quality, and reliability throughout the pipeline.
  • Performance Optimization :

  • Optimize PySpark jobs and pipelines for faster data processing, handling large volumes of data efficiently with minimal latency.
  • Proven experience as a Data Engineer or similar role, with a strong background in database development, ETL processes, and software development.
  • Proficiency in SQL and scripting languages such as Python, with experience working with relational databases.
  • Proficiency in dataProc (PySpark), Pandas or other data processing libraries
  • Experience with data modeling, schema design, and optimization techniques for scalability.
  • Strong analytical and problem-solving skills, with the ability to troubleshoot complex data issues and optimize data processing pipelines for scale
  • Required Qualifications :

  • 4-8 years of experience in data engineering, with a strong focus on PySpark and large-scale data processing.
  • Technical Skills :

  • Expertise in PySpark for distributed data processing, data transformation, and job optimization.
  • Strong proficiency in Python and SQL for data manipulation and pipeline creation.
  • Hands-on experience with Apache Spark and its ecosystem, including Spark SQL, Spark Streaming, and PySpark MLlib.
  • Solid experience working with ETL tools and frameworks, such as Apache Airflow or similar orchestration tools.
  • (ref : hirist.tech)

    Create a job alert for this search

    Data Engineer • Chennai