We are seeking a skilled and motivated Data Engineer to join our team. The ideal candidate will be responsible for designing, implementing, and maintaining our data infrastructure to support our B2B intelligence platform.
Responsibilities
- Design, build, and maintain scalable data pipelines for collecting, processing, and storing large volumes of business data
- Develop ETL processes to integrate data from various sources, including web scraping, APIs, and third-party data providers
- Implement data quality checks and monitoring systems to ensure data accuracy and integrity
- Optimize data storage and retrieval processes for high performance and scalability
- Collaborate with data scientists to implement machine learning models in production environments
- Work with the backend team to design and implement APIs for data access
- Collaborate with data scientists to develop and deploy machine learning models.
- Implement data security and privacy measures to protect sensitive information.
- Stay up to date with the latest big data technologies and best practices
Requirements
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field4+ years of experience in data engineering rolesStrong programming skills in Python, Scala and / or JavaExpertise in SQL and experience with NoSQL databases (e.g., MongoDB, Cassandra)Proficiency with big data technologies such as Apache Spark, Hadoop, and KafkaExperience with cloud platforms (AWS, GCP, or Azure) and their data servicesFamiliarity with data warehousing concepts and ETL processesExperience with data warehousing solutions (e.g., AWS Redshift, Snowflake).Knowledge of data modelling, data architecture, and data pipeline designExperience with version control systems (e.g., Git) and CI / CD practicesExcellent problem-solving skills and attention to detailPreferred Qualifications
Experience in the B2B data or sales intelligence industryFamiliarity with web scraping techniques and toolsKnowledge of data privacy regulations (e.g., GDPR, CCPA)Experience with real-time data processing and streaming architectures