Roles & Responsibilities :
- Design, develop, and maintain data solutions for data generation, collection, and processing
- Be a key team member that assists in design and development of the data pipeline
- Create data pipelines and ensure data quality by implementing ETL processes to migrate and deploy data across systems
- Contribute to the design, development, and implementation of data pipelines, ETL / ELT processes, and data integration solutions
- Collaborate with cross-functional teams to understand data requirements and design solutions that meet business needs
- Develop and maintain data models, data dictionaries, and other documentation to ensure data accuracy and consistency
- Implement data security and privacy measures to protect sensitive data
- Leverage cloud platforms (AWS preferred) to build scalable and efficient data solutions
- Collaborate and communicate effectively with product teams
- Identify and resolve complex data-related challenges
- Adhere to best practices for coding, testing, and designing reusable code / component
- Explore new tools and technologies that will help to improve ETL platform performance
- Participate in sprint planning meetings and provide estimations on technical implementation
What we expect of you
We are all different, yet we all use our unique contributions to serve patients.Basic Qualifications :
Bachelor's degree and 0 to 3 years of Computer Science, IT or related field experienceDiploma and 4 to 7 years of Computer Science, IT or related field experiencePreferred Qualifications :Functional Skills :Must-Have Skills :
Hands on experience with big data technologies and platforms, such as Databricks, Apache Spark (PySpark, SparkSQL), AWS, Redshift, Snowflake, workflow orchestration, performance tuning on big data processingProficiency in data analysis tools (eg. SQL) and experience with data visualization tools.Proficient in SQL for extracting, transforming, and analyzing complex datasets from relational data stores.Experience with ETL tools such as Apache Spark, and various Python packages related to data processing, machine learning model developmentGood-to-Have Skills :
Experience with data modeling, performance tuning on relational and graph databases ( e.g. Marklogic, Allegrograph, Stardog, RDF Triplestore).Understanding of data modeling, data warehousing, and data integration conceptsKnowledge of Python / R, Databricks, SageMaker, cloud data platformExperience with Software engineering best-practices, including but not limited to version control, infrastructure-as-code, CI / CD, and automated testingProfessional Certifications :
AWS Certified Data Engineer preferredDatabricks Certificate preferredSkills Required
Pyspark, Sparksql, Sql, Aws, Etl