Talent.com
Data Engineer (Webscraping)

Data Engineer (Webscraping)

Solytics PartnersPune, Republic Of India, IN
8 days ago
Job description

Company Profile :

Solytics Partners is a Global Analytics firm, recognized with multiple industry awards for innovation and excellence. Our team comprises experts with deep knowledge in risk, analytics, AI / ML, AML / FCC, and fraud. By converging this expertise with cutting edge technologies like AI, Machine Learning, Generative AI, and Large Language Models (LLMs), we deliver powerful automated platforms and incisive point solutions. Our offerings enable clients to streamline and future-proof their risk, AML, and analytics processes, comply seamlessly with global regulations, and safeguard financial systems. Whether it’s solving complex challenges or driving operational efficiency, Solytics Partners is committed to empowering organizations with transformative tools to stay ahead in an evolving regulatory landscape.

Job Title : Data Engineer (Web Scraping)

Experience : 5 – 10 years of relevant experience

Location & Timings : Pune – Work from office & Timing - 11 : 00 AM – 8 : 00 PM

Education Qualification : Masters or bachelor's in computer science or IT or in other relevant discipline from a reputed institute.

Role Type : Permanent / Full Time

Job Description : We are seeking an experienced Data Engineering & Automation Lead to design, automate, and optimize large-scale data processing and web scraping pipelines. The role involves leading a team to build and maintain high-performance ETL workflows using Apache Airflow, Apache Spark, and AWS services, while integrating AI / NLP solutions powered by OpenAI GPT and other GenAI models for intelligent data extraction and analytics.

Responsibilities :

  • Design, automate, and maintain ETL and data processing pipelines using Apache Airflow and Apache Spark.
  • Build, monitor, and optimize web scraping and data extraction workflows for global compliance and risk data sources.
  • Lead and manage web scraping and data engineering teams, ensuring delivery excellence, code quality, and scalability.
  • Create, design, and document automation workflows and secure data-sharing systems using AWS (Lambda, S3, API Gateway, SQS).
  • Implement AI and NLP integrations using OpenAI GPT and GenAI models for intelligent data extraction, tagging, and analytical automation.
  • Analyze large-scale datasets to identify quality gaps, improve accuracy, and optimize indexing and retrieval performance.
  • Collaborate with Backend, DevOps, and Frontend teams for data modeling, monitoring, and visualization.
  • Work closely with clients to gather and translate business requirements into scalable automation and analytics solutions.
  • Author HLD / LLD documentation, mentor junior engineers, and continuously improve automation processes and data workflows.

Required Skills :

  • Programming : Python, SQL, JavaScript
  • Data Engineering & Automation : Apache Airflow, Apache Spark, Web Scraping (Scrapy, Selenium), Pandas, NumPy
  • Databases & Storage : Elasticsearch, MongoDB, MySQL
  • Cloud & Backend : AWS (Lambda, S3, EC2, CloudWatch, SQS, SNS, EKS), Docker, Django, Flask
  • AI / ML & NLP : OpenAI GPT APIs, NER, Sentiment Analysis, Embeddings, Information Extraction
  • Monitoring & Tools : Grafana, Git, Postman, Jupyter, VS Code Good to Have
  • Strong understanding of Large Language Models (LLMs) and Generative AI for building intelligent data extraction and analytics agents.
  • Familiarity with risk and compliance domains, including Sanctions, PEP (Politically Exposed Persons), and AMS (Adverse Media Screening) data and processes.
  • Soft Skills :

  • Leadership & Team Mentoring
  • Problem-Solving & Analytical Thinking
  • Clear Technical Communication
  • Cross-functional Collaboration
  • Create a job alert for this search

    Data Engineer • Pune, Republic Of India, IN

    Related jobs
    • Promoted
    GCP Data Engineer

    GCP Data Engineer

    AdastraNagpur, IN
    We are looking for a proactive and solution-oriented GCP Data Engineer to join our team.This role requires hands-on experience in Google Cloud Platform (GCP), especially with BigQuery and Airflow, ...Show moreLast updated: 2 days ago
    • Promoted
    Senior Web Scraping Engineer

    Senior Web Scraping Engineer

    ZomunkIndia
    We're building a product that relies heavily on collecting structured data from a number of known websites.We need someone experienced who can own this part of the system end-to-end; from writing s...Show moreLast updated: 23 hours ago
    • Promoted
    (Senior) Azure Data Engineer - Full remote - contractor in USD

    (Senior) Azure Data Engineer - Full remote - contractor in USD

    All European CareersNagpur, IN
    Remote
    For an international project in Chennai, we are urgently looking for a Full Remote Senior Azure Data Engineer, who will build data pipeline for enterprise search applications using ADF and Databric...Show moreLast updated: 1 day ago
    • Promoted
    Web Scraping Engineer

    Web Scraping Engineer

    noonIndia
    Job title : Web Scraping Engineer.The ideal candidate will design and implement robust scrapers to collect, clean, and normalize product data (pricing, availability, reviews, images, etc.Develop and...Show moreLast updated: 23 hours ago
    • Promoted
    Data Engineer

    Data Engineer

    RecroIndia, India
    Data Pipeline Engineering : Design, build, and maintain ingestion, transformation, and storage pipelines using Azure Data Factory, Synapse Analytics, and Data Lake. AI Data Enablement : Collaborate wi...Show moreLast updated: 15 days ago
    • Promoted
    Data Engineer

    Data Engineer

    IntraEdgeIndia, India
    We are seeking a highly skilled Data Engineer with strong experience in Python, PySpark, Snowflake, and AWS Glue to join our growing data team. You will be responsible for building scalable and reli...Show moreLast updated: 30+ days ago
    • Promoted
    Senior AWS Data Engineer

    Senior AWS Data Engineer

    CYAN360Nagpur, IN
    Position : Senior AWS Data Engineer.Work Timings : 2 : 30 PM to 11 : 30 PM IST.Need someone who can join immediately or in 15 days • • •. Design, develop, and deploy end-to-end data pipelines on AWS cloud in...Show moreLast updated: 30+ days ago
    • Promoted
    Databricks Data Engineer Lead – Sustainability Project

    Databricks Data Engineer Lead – Sustainability Project

    Blue Cloud Softech Solutions LimitedNagpur, IN
    BCSS is seeking a Databricks Data Engineer to support its enterprise-wide Sustainability initiative.The engineer will be responsible for building data pipelines and models to support product-level ...Show moreLast updated: 1 day ago
    • Promoted
    Backend + AI Engineer

    Backend + AI Engineer

    RiviNagpur, IN
    We build AI-first products across travel and beyond.We’re looking for a backend-builder passionate about scalable APIs, microservices, databases, and LLM integrations to power seamless, high-perfor...Show moreLast updated: 30+ days ago
    • Promoted
    AWS Data Engineer

    AWS Data Engineer

    Atyeti IncIndia
    Looking for Data Engineer who will be responsible for design, build and maintenance of data pipelines running on Airflow, Spark on the AWS Cloud platform. Build and maintain all facets of Data Pipel...Show moreLast updated: 23 hours ago
    • Promoted
    Azure Data Engineer

    Azure Data Engineer

    SystemBenderNagpur, IN
    Responsible for designing and maintaining scalable data pipelines on Microsoft Fabric and Azure.Focus includes ingesting structured, semi-structured, and unstructured data, managing OneLake / Delta L...Show moreLast updated: 9 days ago
    • Promoted
    AWS Data Engineer

    AWS Data Engineer

    Vista Applied Solutions Group IncNagpur, IN
    Job Summary for AWS Data Engineer : .Job Qualification and Responsibilities for AWS Data Engineer : .Advanced SQL (performance tuning, joins, windows functions). Hands-on with AWS : Glue, Lambda, Redshif...Show moreLast updated: 1 day ago
    • Promoted
    Data Engineer - Web Scraping

    Data Engineer - Web Scraping

    Alternative PathIndia, India
    Alternative Path is seeking skilled software developers to collaborate on client projects with an asset management firm.In this role, you will collaborate with individuals across various company de...Show moreLast updated: 30+ days ago
    • Promoted
    Web Crawling Engineer

    Web Crawling Engineer

    Forage AIIndia, India
    The ideal candidate will have strong Python programming skills and experience in web scraping frameworks, browser automation tools, and handling anti-scraping mechanisms. Forage AI is a pioneering A...Show moreLast updated: 7 days ago
    • Promoted
    Data Engineer

    Data Engineer

    DigitalzoneIndia, India
    As a Data Engineer, you will design, build, and optimize data pipelines and real-time systems that power AI-driven decisioning and analytics. Develop and maintain scalable ETL / ELT pipelines using Py...Show moreLast updated: 15 days ago
    • Promoted
    GCP Data Engineer

    GCP Data Engineer

    PamTen IncNagpur, IN
    You will work ongoing support activities and support project efforts as needed.You triage identified issues across Account source platforms, integrations and Customer Data Hub.You will analyze and ...Show moreLast updated: 7 days ago
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    DonyatiNagpur, IN
    We are seeking a highly skilled Senior Data Engineer to join our team in building a modern data platform on AWS.You will play a key role in transitioning from legacy systems to a scalable, cloud-na...Show moreLast updated: 1 day ago
    • Promoted
    Data Engineer

    Data Engineer

    Insight GlobalNagpur, IN
    GCP DATA ENGINEER - Contract (Long term).Data Engineer with hands-on support for Google Looker.Strong experience in data modeling and building data marts. Proficiency in ETL / ELT pipeline development...Show moreLast updated: 30+ days ago