DATAmundi builds advanced software solutions that power localization and data services, supporting AI companies and research teams with high-quality datasets and validation workflows. The Technical Project Manager role involves coordinating data deliveries, translating client requirements into validation logic, and ensuring data quality and consistency for AI data processing workflows.
Responsibilities:
- Own the end-to-end coordination of data deliveries, from intake to validation and handoff
- Work with client data delivered via S3 buckets or direct uploads; ensure correct structure, completeness, and readiness for downstream use
- Translate client guidelines into automated validation using SQL, regex, and supporting scripts
- Create and to compute quality and consistency metrics such as:
- WER (Word Error Rate) maintain Python utilities
- IAA (Inter-Annotator Agreement)
- Additional dataset-level metrics as required
- Use Windows Command Prompt for bulk file operations (creating/moving/downloading folders and files) to support processing and delivery workflows
- Partner with internal development teams by writing Jira tickets for platform improvements and bug fixes (requirements, steps to reproduce, acceptance criteria)
- Quickly ramp on internal platforms and configuration logic (e.g., worktypes / templates), advising on setup patterns and tradeoffs
- Investigate issues by querying datasets through database tools and producing clear summaries of findings and next steps
Requirements:
- 3 years experience in technical delivery / project coordination in a data environment (data, analytics, ML, QA automation, or platform operations)
- Practical comfort with Python (scripting for metrics and data validation workflows)
- Practical comfort with SQL (queries used in automated checks)
- Practical comfort with Regex (pattern-based validation)
- Practical comfort with Command line / Windows CMD (bulk file operations)
- Strong written communication and the ability to convert fuzzy requirements into precise, testable checks
- Experience working with engineering teams and using tools like Jira to drive execution
- Ability to evaluate options and recommend an approach based on pros/cons, timelines, and maintainability
- Experience with speech/audio or text datasets (given WER and annotation agreement use cases)
- Familiarity with cloud data workflows (especially AWS S3 concepts like buckets, prefixes, access patterns)
- Experience with data labeling/annotation workflows and quality frameworks