We are seeking a Senior Data Engineer with 8+ years of real-world experience to act as a Subject Matter Expert (SME) and train students specifically for Data Engineering, Big Data, and AI / ML Data Platform interviews.
The ideal candidate will not only be hands-on with the latest data + AI / ML technologies but also excel at coaching, mentoring, and conducting mock interviews to prepare students for success in top-tier companies.
This role is perfect for someone who can bridge deep technical expertise with interview preparation and training.
Key Responsibilities
Act as SME & trainer, guiding students to crack Data Engineering, AI / ML, and Cloud Data Platform interviews.
Conduct mock interviews, Q&A sessions, and technical deep-dives.
Train students on real-world interview scenarios :
End-to-end ETL / ELT pipelines
Data modelling & warehousing
Data for AI / ML use cases (feature pipelines, vector databases, embeddings)
Streaming & batch processing at scale
Data governance, lineage, and security-first architectures
Core Data Engineering + AI / ML Pillars to Cover
1. Data Warehousing & Modelling : Star, Snowflake, and Data Vault modelling, SCD Types 1–6, OLTP vs OLAP vs Lakehouse, Schema evolution & data versioning
2. ETL / ELT & Orchestration
Orchestration : Apache Airflow, Prefect, Dagster, Azure Data Factory
Transformation : dbt, Spark SQL, Pandas, PySpark
Batch & streaming workflows (Kafka, Flink, Spark Structured Streaming, Kinesis)
3. Big Data & Distributed Processing : Spark (PySpark, Scala, Delta Lake), Hive, Presto / Trino, Iceberg, Hudi, Partitioning, bucketing, caching & shuffle optimisation, Lakehouse & Data Mesh architectures
4. Cloud Data Platforms
Snowflake, Databricks, BigQuery, Redshift, Synapse, Multi-cloud (AWS, Azure, GCP) + hybrid / on-prem migration
5. Data Storage & Ingestion : Data Lakes (S3, ADLS, GCS), Semi / unstructured data (Parquet, ORC, Avro, JSON, XML, multimedia), Real-time ingestion (Kafka, Pulsar, Debezium, CDC pipelines)
6. Observability & Monitoring
Pipeline observability (Prometheus, Grafana, ELK, CloudWatch, Datadog)
Data quality & reliability (Great Expectations, Soda, Deequ)
End-to-end lineage & metadata (Apache Atlas, DataHub, Purview, Collibra)
7. Security & Governance
IAM, RBAC, ABAC, fine-grained access controls
Data masking, tokenization, PII handling
GDPR, HIPAA, CCPA compliance
Secrets management (Vault, Key Vault, KMS)
8. AI / ML Enablement
Feature Engineering Pipelines : scalable pipelines for ML models
Feature Stores : Feast, Tecton, Databricks Feature Store
MLOps Practices : MLflow, SageMaker Pipelines, Vertex AI, Azure ML
Vector Databases & RAG : Pinecone, Weaviate, Milvus, Chroma for LLM apps
Model Serving : TensorFlow Serving, TorchServe, Kubernetes-based serving
AI Workflows : data prep for NLP, embeddings, recommendation systems
LLM Integration : prompt engineering, embeddings pipeline, data optimisation for GenAI workloads
Required Skills & Experience
8+ years in Data Engineering / Big Data / Cloud roles, with hands-on AI / ML data enablement.
Proven expertise with ETL / ELT, Spark, Data Lakes, Data Warehouses, and distributed systems.
Strong programming skills in Python, SQL, PySpark (Scala / R / optional).
Expertise in cloud-native data + AI / ML platforms (AWS SageMaker, Azure ML, GCP Vertex AI, Databricks).
Hands-on with MLOps, Feature Stores, Vector Databases, and ML model integration.
Deep understanding of pipeline optimisation, governance, and cost efficiency.
Excellent communication & teaching skills — ability to simplify complex data & AI concepts.
Prior mentoring, training, or interview-prep experience strongly preferred.
Senior Data Engineer • Nagpur, IN