Work location : Pune (Working from office is mandatory IN office / Hybrid)
Experience : Overall IT exp of 6 to 12 years experience, with relevant experience as Data Engineer 4 yrs & above
Join us in building scalable data pipelines using Python- Pandas / Polars, SQL, Airflow, and Azure DevOps. If you love solving problems across diverse data sources (APIs, PDFs, web scraping) and working hands-on with pandas / polars, SQL, and test automation.
Tech stack :
Main / essential : Python, Pandas and / or Polars Essential, Web Scraping, including using Selenium, SQL, Azure DevOps and Airflow
Additional : Databricks, AWS, Jenkins, ADO Pipelines
Key Responsibilities :
- Design, build, and maintain pipelines in Python to collect data from a wide range of sources (APIs, SFTP servers, websites, emails, PDFs, etc.)
- Deploy and orchestrate workflows using Apache Airflow
- Perform web scraping using libraries like requests, BeautifulSoup, Selenium
- Handle structured, semi-structured, and unstructured data efficiently
- Transform datasets using pandas and / or polars
- Write unit and component tests using pytest
- Collaborate with platform teams to improve the data scraping framework
- Query and analyze data using SQL (PostgreSQL, MSSQL, Databricks)
- Conduct code reviews, support best practices, and improve coding standards across the team
- Manage and maintain CI / CD pipelines (Azure DevOps Pipelines, Jenkins)
Required Skills & Experience :
Proficient in Python, with deep experience using pandas or polarsStrong understanding of ETL development, data extraction, and transformationHands-on experience with SQL and querying large datasetsExperience deploying workflows on Apache AirflowFamiliar with web scraping techniques (Selenium is a plus)Comfortable working with various data formats and large-scale datasetsExperience with Azure DevOps, including pipeline configuration and automationFamiliarity with pytest or equivalent test frameworksStrong communication skills and a team-first attitude.Experience with DatabricksFamiliarity with AWS servicesWorking knowledge of Jenkins and advanced ADO Pipelines(ref : hirist.tech)