About Cognite :
Cognite is revolutionizing industrial data management through our flagship product Cognite Data Fusion, a state-of-the-art SaaS platform that transforms how industrial companies leverage their data.
Were seeking a Senior Data Platform Engineer who excels at building high-performance distributed systems and thrives in a fast-paced, startup-style environment. Youll work on cutting-edge data infrastructure challenges that directly impact how Fortune 500 industrial companies manage their most critical operational data.
Key Responsibilities :
1. High-Performance Data Systems :
- Design and implement scalable data processing pipelines using Apache Spark, Flink, and Kafka for terabyte-scale datasets.
- Build efficient APIs and backend services supporting thousands of concurrent users with sub-second latency.
- Optimize data storage and retrieval for time-series, sensor, and operational datasets.
- Implement advanced caching strategies using Redis and in-memory data structures.
2. Distributed Processing Excellence :
Engineer optimized Spark applications with deep knowledge of the Catalyst optimizer and partitioning strategies.Develop real-time streaming solutions processing millions of events per second using Kafka and Flink.Design efficient data lake architectures on S3 / GCS using formats like Parquet and ORC.Implement query optimization for OLAP datastores such as ClickHouse, Pinot, or Druid.3. Scalability & Performance :
Scale systems to 10K+ QPS while maintaining high availability and data consistency.Tune JVM performance through garbage collection and memory optimization.Establish comprehensive monitoring using Prometheus, Grafana, and distributed tracing.Design fault-tolerant architectures with circuit breakers and retry mechanisms.4. Technical Innovation :
Contribute to open-source projects within the big data ecosystem (Spark, Kafka, Airflow).Research and prototype emerging technologies for industrial data challenges.Collaborate with product teams to deliver scalable and reliable technical solutions.Participate in architecture reviews and technical design discussions.Requirements :
1. Distributed Systems Experience (4- 6 years) :
Proven production experience with Spark (building and optimizing large-scale applications).Strong proficiency with Kafka, Flink, or Spark Streaming for real-time data processing.Expertise in JVM languages (Java, Scala, or Kotlin) with performance tuning experience.2. Data Platform Foundations (3+ years) :
Hands-on experience with data lakes, columnar formats, and table formats (Iceberg, Delta Lake).Worked with OLAP query engines like Presto / Trino, ClickHouse, or Pinot.Built robust ETL / ELT pipelines using Airflow, DBT, or custom frameworks.Technical Depth Indicators :
Delivered measurable performance improvements (2x+ throughput gains).Optimized resource utilization and cost efficiency.Designed thread-safe, high-concurrency data processing systems.Implemented data quality frameworks and schema evolution management.Designed efficient schemas for analytical workloads.Collaboration & Growth :
Worked cross-functionally with PMs, ML engineers, and data scientists.Maintained high code quality through thoughtful reviews and documentation.Adapted quickly to new tools and frameworks.Demonstrated systematic debugging and problem-solving in distributed systems.Startup Mindset :
Delivered high-quality features under tight deadlines.Balanced technical debt, speed, and system reliability.Took end-to-end ownership from design to production.Thrived amid evolving requirements and ambiguity.Made customer-centric technical decisions.Bonus Points :
Contributions to Apache open-source projects (Spark, Kafka, Airflow).Public speaking or technical blogging experience.Industrial domain knowledge (IoT, manufacturing, operational systems).(ref : hirist.tech)