Talent.com
No longer accepting applications
Data Engineer (Webscraping)

Data Engineer (Webscraping)

Solytics PartnersThoothukudi, IN
13 hours ago
Job description

Company Profile :

Solytics Partners is a Global Analytics firm, recognized with multiple industry awards for innovation and excellence. Our team comprises experts with deep knowledge in risk, analytics, AI / ML, AML / FCC, and fraud. By converging this expertise with cutting edge technologies like AI, Machine Learning, Generative AI, and Large Language Models (LLMs), we deliver powerful automated platforms and incisive point solutions. Our offerings enable clients to streamline and future-proof their risk, AML, and analytics processes, comply seamlessly with global regulations, and safeguard financial systems. Whether it’s solving complex challenges or driving operational efficiency, Solytics Partners is committed to empowering organizations with transformative tools to stay ahead in an evolving regulatory landscape.

Job Title : Data Engineer (Web Scraping)

Experience : 5 – 10 years of relevant experience

Location & Timings : Pune – Work from office & Timing - 11 : 00 AM – 8 : 00 PM

Education Qualification : Masters or bachelor's in computer science or IT or in other relevant discipline from a reputed institute.

Role Type : Permanent / Full Time

Job Description : We are seeking an experienced Data Engineering & Automation Lead to design, automate, and optimize large-scale data processing and web scraping pipelines. The role involves leading a team to build and maintain high-performance ETL workflows using Apache Airflow, Apache Spark, and AWS services, while integrating AI / NLP solutions powered by OpenAI GPT and other GenAI models for intelligent data extraction and analytics.

Responsibilities :

  • Design, automate, and maintain ETL and data processing pipelines using Apache Airflow and Apache Spark.
  • Build, monitor, and optimize web scraping and data extraction workflows for global compliance and risk data sources.
  • Lead and manage web scraping and data engineering teams, ensuring delivery excellence, code quality, and scalability.
  • Create, design, and document automation workflows and secure data-sharing systems using AWS (Lambda, S3, API Gateway, SQS).
  • Implement AI and NLP integrations using OpenAI GPT and GenAI models for intelligent data extraction, tagging, and analytical automation.
  • Analyze large-scale datasets to identify quality gaps, improve accuracy, and optimize indexing and retrieval performance.
  • Collaborate with Backend, DevOps, and Frontend teams for data modeling, monitoring, and visualization.
  • Work closely with clients to gather and translate business requirements into scalable automation and analytics solutions.
  • Author HLD / LLD documentation, mentor junior engineers, and continuously improve automation processes and data workflows.

Required Skills :

  • Programming : Python, SQL, JavaScript
  • Data Engineering & Automation : Apache Airflow, Apache Spark, Web Scraping (Scrapy, Selenium), Pandas, NumPy
  • Databases & Storage : Elasticsearch, MongoDB, MySQL
  • Cloud & Backend : AWS (Lambda, S3, EC2, CloudWatch, SQS, SNS, EKS), Docker, Django, Flask
  • AI / ML & NLP : OpenAI GPT APIs, NER, Sentiment Analysis, Embeddings, Information Extraction
  • Monitoring & Tools : Grafana, Git, Postman, Jupyter, VS Code Good to Have
  • Strong understanding of Large Language Models (LLMs) and Generative AI for building intelligent data extraction and analytics agents.
  • Familiarity with risk and compliance domains, including Sanctions, PEP (Politically Exposed Persons), and AMS (Adverse Media Screening) data and processes.
  • Soft Skills :

  • Leadership & Team Mentoring
  • Problem-Solving & Analytical Thinking
  • Clear Technical Communication
  • Cross-functional Collaboration
  • Create a job alert for this search

    Data Engineer • Thoothukudi, IN