Design, build, and optimize data pipelines and infrastructure for AI products
Collaborate closely with AI/ML teams, product teams, and security/compliance partners
Develop and operate ETL/ELT workflows
Implement and optimize vector database systems and embeddings pipelines
Architect and manage Azure-based data infrastructure
Build internal tools for metadata extraction and document parsing
Monitor and improve pipeline performance and reliability
Requirements
10+ years in Data Engineering, Software Engineering, or ML/Data Infrastructure roles
Strong experience with Python, SQL, and modern data engineering tools (Airflow, Dagster, dbt, Prefect, etc.)
Experience building large-scale document extraction ETL pipelines (OCR, PDF parsing, metadata extraction, NLP preprocessing)
Proficiency with Kubernetes, Docker, and containerized data pipelines deployed on Azure, AWS and/or Google Cloud
Hands-on experience with relational databases (Postgres, SQL Server, MySQL) and non-relational systems such as Elasticsearch, Redis, and graph databases
Experience with document-heavy or text-heavy data processing (OCR, parsing, NLP preprocessing)
Strong data quality, governance, lineage, and validation mindset
Excellent communicator who can align with ML, engineering, and product teams.