Design, build, and optimize data pipelines and infrastructure for AI products
Collaborate closely with AI/ML teams, product teams, and security/compliance partners
Develop and operate ETL/ELT workflows
Implement and optimize vector database systems and embeddings pipelines
Architect and manage Azure-based data infrastructure
Build internal tools for metadata extraction and document parsing
Monitor and improve pipeline performance and reliability

10+ years in Data Engineering, Software Engineering, or ML/Data Infrastructure roles
Strong experience with Python, SQL, and modern data engineering tools (Airflow, Dagster, dbt, Prefect, etc.)
Experience building large-scale document extraction ETL pipelines (OCR, PDF parsing, metadata extraction, NLP preprocessing)
Proficiency with Kubernetes, Docker, and containerized data pipelines deployed on Azure, AWS and/or Google Cloud
Hands-on experience with relational databases (Postgres, SQL Server, MySQL) and non-relational systems such as Elasticsearch, Redis, and graph databases
Experience with document-heavy or text-heavy data processing (OCR, parsing, NLP preprocessing)
Strong data quality, governance, lineage, and validation mindset
Excellent communicator who can align with ML, engineering, and product teams.

Senior Data Engineer

Key skills