Candidate should be able to :
Consult with business stakeholders and translate their requirements into analytics and reports on the data lake
Design and implement a data catalog that democratizes data access in a secure way
Develop and execute test plans to validate code
Investigate, recommend and implement data ingestion and ETL performance improvements
Utilize best practices for the design and implementation of the data lake storage approach on Public Cloud : including data store, formats, encryption, compression, and access controls
Connect applications to the data lake for data consumption
On-board end-to-end datasets to the data lake and relevant access provisioning
Uncover and recommend remediation for data quality anomalies
Build and implement complex data solutions in the cloud
Candidate should have :
Bachelor's degree or equivalent experience in Computer Science, Engineering, or Mathematics
Experience developing software code in one or more programming languages (Java, Python, etc.)
Experience with one or more SQL-on-Hadoop technology (Hive, Impala, Spark SQL, Presto)
Experience with one or more relevant tools (Sqoop, Flume, Kafka, Oozie, Hue, Zookeeper, HCatalog, Solr, Avro)
Experience with Apache Hadoop and the Hadoop ecosystem, including hands-on experience in implementation and performance tuning Hadoop / Spark implementations.
Experience with cloud and commercial Data Lake platforms
Understanding of database and analytical technologies in the industry including MPP and NoSQL databases, Data Warehouse design, BI reporting, and Dashboard development
Implementing AWS services in a variety of distributed computing, enterprise environments
Ability to collaborate effectively across organizations
Ability to think strategically about business, product, and technical challenges in an enterprise environment
Demonstrated industry expertise in the fields of database, data warehousing, or data science
Senior Data Engineer • chennai, India