Talent.com
Data Platform Engineer

Data Platform Engineer

BharatGenMumbai Metropolitan Region, India
8 hours ago
Job description

Job Summary :

BharatGen is on a mission to create AI that truly represents the diversity, culture, and unique context of India. At the heart of this mission lies the need for robust, scalable infrastructure to build multilingual and multimodal datasets that power foundational AI models. We’re seeking a skilled Data Platform Engineer to build scalable tools, platforms, and pipelines tailored for processing large-scale, multilingual, multimodal datasets critical for foundational AI models.

In this role, you will build scalable data pipelines to ingest, transform, and prepare data from diverse sources—text, speech, images, and video—making it ready for Generative AI model training. Your work will involve developing and managing the underlying platform while addressing challenges like governance, security, observability, lineage, and scalability. The outcomes of your work will include efficient tools for data processing, a reliable data platform, and high-quality datasets tailored to the evolving needs of large-scale AI and LLM training.

Collaborating closely with researchers and ML engineers, you will play a pivotal role in enabling BharatGen to deliver state-of-the-art AI models, contributing to the advancement of India’s AI ecosystem through innovative data engineering solutions.

Key Responsibilities :

  • Design and Build Scalable Platforms : Develop distributed infrastructure for ingesting, processing, and transforming diverse datasets (text, speech, images, video) at terabyte to petabyte scale.
  • Develop Robust Data Pipelines : Create reliable, scalable pipelines to prepare datasets for Generative AI and LLM training.
  • Implement Governance and Observability : Build frameworks for data lineage, monitoring, and access control to ensure data quality and operational reliability.
  • Optimize Performance and Cost : Enhance platform performance and resource utilization using cost-effective strategies, including GPU-accelerated preprocessing.
  • Collaborate and Innovate : Work closely with researchers and ML engineers to adapt platforms and data pipelines to evolving LLM requirements, addressing various data challenges.
  • Drive Innovation : Stay updated on emerging tools, frameworks, and best practices to implement cutting-edge solutions for large-scale dataset creation.

Minimum Qualifications and Experience :

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field with 3+ years of industry experience.
  • Required Skills :

  • Proficiency in distributed systems and frameworks (e.g., Kafka, Ray, PySpark) for scalable data workflows.
  • Exposure to end-to-end data lifecycle management, including DataOps.
  • Strong programming skills in Python, Scala, or Go, with a focus on high-performance pipeline development.
  • Experience with building and optimizing data pipelines, including ETL processes, data modeling, and integration into scalable workflows.
  • Expertise in data scraping, crawling frameworks, and modern dataset development techniques such as synthetic data generation techniques.
  • Experience with cloud platforms (AWS, GCP, Azure) and container orchestration (Docker, Kubernetes).
  • Deep understanding of data platform design, including data architecture, metadata tracking, data lineage, observability, monitoring, and scalability best practices.
  • Familiarity with Infrastructure-as-Code tools (e.g., Terraform, CloudFormation), CI / CD pipelines, relational / NoSQL databases, and GPU-accelerated workflows.
  • Familiarity with visualization and monitoring tools for lifecycle management and pipeline performance tracking.
  • Expertise in managing unstructured data (text, speech, or multimodal datasets) for high-performance use cases, ideally in the context of LLM / AI datasets.
  • Understanding of challenges in scalable data engineering, including ingestion, transformation, and storage optimization for large-scale accelerated workflows.
  • Create a job alert for this search

    Data Platform Engineer • Mumbai Metropolitan Region, India

    Related jobs
    • Promoted
    • New!
    Data Engineer

    Data Engineer

    SynechronMumbai Metropolitan Region, India
    We have immediate opportunity for Data Engineer – 3-7 Years.We began life in 2001 as a small, self-funded team of technology specialists. Since then, we’ve grown our organization to 14,500+ people, ...Show moreLast updated: 8 hours ago
    • Promoted
    Senior Data Platform Engineer

    Senior Data Platform Engineer

    Black Dog LabsMumbai, IN
    Remote (collaboration across time zones), India or LATAM preferred.Proficient English communication.Data Engineering / Backend Engineering / DevOps. We’re looking for a hands-on Senior Data Platform...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    MindlanceThane, IN
    We’re looking for a strong, hands-on Sr Data Engineer who can independently drive client conversations, collaborate with SMEs, and deliver high-quality data solutions in a cloud-native environment....Show moreLast updated: 2 days ago
    • Promoted
    Data Engineer

    Data Engineer

    RecroKalyan-Dombivli, IN
    Data Pipeline Engineering : Design, build, and maintain ingestion, transformation, and storage pipelines using Azure Data Factory, Synapse Analytics, and Data Lake. AI Data Enablement : Collaborate wi...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer

    Data Engineer

    MyData Insights Pvt LtdKalyan-Dombivli, IN
    The ideal candidate has a strong background in building scalable data solutions on.CI / CD pipelines, infrastructure automation using Terraform, and cloud-native data engineering practices.Design, bu...Show moreLast updated: 2 days ago
    • Promoted
    Data Engineer

    Data Engineer

    upGrad RekrutMumbai Metropolitan Region, India
    Minimum 3 years of previous industry work experience will be preferred.In-depth understanding of database structure principles. Knowledge of data mining and segmentation techniques, expertise in SQL...Show moreLast updated: 30+ days ago
    • Promoted
    Databricks Engineer

    Databricks Engineer

    TTC GroupThane, IN
    We are seeking a Mid-Level Databricks Engineer with strong data engineering fundamentals and hands-on experience building scalable data pipelines on the Databricks platform.The ideal candidate will...Show moreLast updated: 2 days ago
    • Promoted
    Palantir Data Engineer

    Palantir Data Engineer

    SageBeans RPOMumbai, IN
    Location : Remote work in India.Experience and skills required for the Palantir Forward Deployed Engineer, Palantir FDE.Python, SQL and optionally Java / Scala. Code repositories, Pipeline Builder, Cod...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Data Engineer – CDP

    Data Engineer – CDP

    Integers.AiMumbai, IN
    Job Description : Data Engineer – CDP.Data Engineer with strong CDP expertise.The ideal candidate will have hands-on experience working with Customer Data Platforms—specifically Real-Time CDP and Sa...Show moreLast updated: 14 hours ago
    • Promoted
    Principal Data Engineer

    Principal Data Engineer

    CodeMyMobileMumbai, IN
    Experience Required - 7 to 10 Years.Are you a Data Engineer who cares about clean engineering, autonomy, and solving real data challenges? If this sounds like you, we’d love to connect!.Email your ...Show moreLast updated: 18 days ago
    • Promoted
    Senior Data Engineer

    Senior Data Engineer

    DonyatiKalyan-Dombivli, IN
    We are seeking a highly skilled Senior Data Engineer to join our team in building a modern data platform on AWS.You will play a key role in transitioning from legacy systems to a scalable, cloud-na...Show moreLast updated: 4 days ago
    • Promoted
    GCP Data Engineer

    GCP Data Engineer

    AdastraThane, IN
    We are looking for a proactive and solution-oriented GCP Data Engineer to join our team.This role requires hands-on experience in Google Cloud Platform (GCP), especially with BigQuery and Airflow, ...Show moreLast updated: 5 days ago
    • Promoted
    Data Engineer

    Data Engineer

    Insight GlobalKalyan-Dombivli, IN
    GCP DATA ENGINEER - Contract (Long term).Data Engineer with hands-on support for Google Looker.Strong experience in data modeling and building data marts. Proficiency in ETL / ELT pipeline development...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer (GCP)

    Data Engineer (GCP)

    HISH IT SERVICESThane, IN
    We have a new urgent GCP Data Engineer opportunity open to support a migration initiative from Teradata to Cerebro (BigQuery). This role requires a hands-on developer who can collaborate closely wit...Show moreLast updated: 10 days ago
    • Promoted
    Data Engineer

    Data Engineer

    Tata Consultancy ServicesKalyan-Dombivli, IN
    TCS has been a great pioneer in feeding the fire of Techies like you.We are a global leader in the technology arena and there’s nothing that can stop us from growing together.Your role is of key im...Show moreLast updated: 30+ days ago
    • Promoted
    Data Engineer

    Data Engineer

    DigitalzoneMumbai, IN
    As a Data Engineer, you will design, build, and optimize data pipelines and real-time systems that power AI-driven decisioning and analytics. Develop and maintain scalable ETL / ELT pipelines using Py...Show moreLast updated: 18 days ago
    • Promoted
    Azure Data Engineer

    Azure Data Engineer

    SystemBenderMumbai, IN
    Responsible for designing and maintaining scalable data pipelines on Microsoft Fabric and Azure.Focus includes ingesting structured, semi-structured, and unstructured data, managing OneLake / Delta L...Show moreLast updated: 12 days ago
    • Promoted
    Data Engineer

    Data Engineer

    BayOne SolutionsThane, IN
    We are seeking a highly experienced Data Engineer to join our MarTech team and play a pivotal role in driving innovation within our microservices architecture, with a strong emphasis on data engine...Show moreLast updated: 30+ days ago