Position Summary
Role Value Proposition :
Work and collaborate with a nimble, autonomous, cross-functional team of makers, breakers, doers, and disruptors who love to solve real problems and meet real customer needs.
You will be using cutting-edge technologies and frameworks to analyze data, to help create the pipeline, and collaborate with the data science team to enable the innovative work in machine learning and AI.
Eagerness to learn new technologies on the fly and ship to production
Knowledge in data science is a plus
More than just a job we hire people who love what they do!
Job Responsibilities
- Building and Implementing data ingestion and curation process developed using Big data tools such as Spark (Scala / python), Data bricks, Delta lake, Hive, Pig, Spark, HDFS, Oozie, Sqoop, Flume, Zookeeper, Kerberos, Sentry, Impala etc.
- Ingesting huge volumes data from various platforms for Analytics needs and writing high-performance, reliable, and maintainable ETL code.
- Monitoring performance and advising any necessary infrastructure changes.
- Defining data security principals and policies using Ranger and Kerberos.
- Assisting application developers and advising on efficient big data application development using cutting edge technologies.
Knowledge, Skills and Abilities
Education
Bachelor's degree in Computer Science, Engineering, or related disciplineExperience
4+ years of solutions development experienceProficiency and extensive Experience with Spark & Scala, Python and performance tuning is a MUSTHive database management and Performance tuning is a MUST. (Partitioning / Bucketing)Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance.Strong analytic skills related to working with unstructured datasets.Experience with building stream-processing systems, using solutions such as Storm or Spark-StreamingExperience in any model management methodologies.Knowledge and skills (general and technical)
Required :
Proficiency and extensive experience in HDFS, Hive, Spark, Scala, Python, Databricks / Delta Lake, Flume, Kafka etc.Analytical skills to analyze situations and come to optimal and efficient solution based on requirements.Performance tuning and problem-solving skills is a mustHive database management and Performance tuning is a MUST. (Partitioning / Bucketing)Hands on development experience and high proficiency in Java or, Python, Scala, and SQLExperience designing multi-tenant, containerized Hadoop architecture for memory / CPU management / sharing across different LOBsPreferred :
Proficiency and extensive Experience with Spark & Scala, Python and performance tuning is a MUSTHive database management and Performance tuning is a MUST. (Partitioning / Bucketing)Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance.Knowledge in data science is a plusExperience with Informatica PC / BDM 10 and implemented push down processing into Hadoop platform, is a huge plus.Proficiency is using tools Git, Bamboo and other continuous integration and deployment toolsExposure to data governance principles such as Metadata, Lineage (Colibra / Atlas) etc.Skills Required
Flume, Sentry, Scala, Kafka, hdfs , Impala, Sql, Hive, Sqoop, Kerberos, Zookeeper, Spark, Oozie, Databricks, Python