About the role :
We are seeking a highly motivated and detail-oriented AI Data Collection Expert to join our innovative AI / ML team. You will be instrumental in the end-to-end process of gathering, annotating, and curating high-quality datasets that serve as the foundation for our cutting-edge AI and machine learning models. This role requires a meticulous eye for detail, a proactive approach to problem-solving, and a commitment to data integrity and quality.
Key responsibilities :
Data sourcing and acquisition : Collaborate with AI / ML researchers and data scientists to understand data requirements and execute comprehensive data acquisition strategies.
Dataset curation and management : Gather, curate, and maintain large-scale datasets, including text, images, audio, and video, from various sources such as websites, databases, and APIs.
Data labeling and annotation : Use specialized tools and platforms to accurately label and annotate diverse datasets for supervised learning models, ensuring clarity and consistency.
Quality assurance and validation : Implement and manage robust quality control frameworks to audit datasets, identify inconsistencies or inaccuracies, and maintain a high data acceptance rate.
Process automation : Identify opportunities to automate and optimize data collection pipelines and labeling workflows to improve efficiency and scalability.
Collaboration : Work closely with cross-functional teams, including product managers and software engineers, to define data requirements and integrate data collection processes into broader product initiatives.
Documentation : Create and maintain detailed documentation of data sources, collection methodologies, and quality assurance processes for transparency and reproducibility.
Compliance and ethics : Ensure all data collection activities comply with relevant data privacy and security regulations (e.g., GDPR, CCPA) and ethical AI principles.
Required qualifications :
Education : Bachelor's or Master's degree in Computer Science or related field.
Technical experience : Proven experience in data collection, data engineering, or data annotation, preferably within an AI / ML context.
Programming proficiency : Strong skills in scripting languages like Python and experience with SQL.
Tool familiarity : Experience with data collection tools (e.g., web scraping libraries like Scrapy or Selenium) and data annotation platforms (e.g., Labelbox, Amazon SageMaker Ground Truth).
Cloud experience (preferred) : Experience with cloud data services on platforms like AWS, GCP, or Azure.
Problem-solving : Exceptional analytical and problem-solving abilities with an unwavering commitment to data quality.
Attention to detail : Meticulous attention to detail is critical for ensuring the accuracy and integrity of datasets.
Communication : Excellent written and verbal communication skills, with the ability to effectively document processes and collaborate with technical and non-technical stakeholders.
Preferred qualifications :
Previous Data collection experience of 1+ years.
Data • India