Design, build, and maintain the core AI platform infrastructure that supports classic machine learning, GenAI/LLM workloads, and emerging agentic AI systems.
Implement and manage cloud‑native environments in AWS, including compute, networking, IAM, security controls, and serverless or containerized runtimes for AI workloads.
Build scalable data and model infrastructure across Snowflake, Databricks (Delta Lake, Unity Catalog), and Dataiku, enabling unified governance, observability, lineage, and automation.
Develop Infrastructure‑as‑Code (IaC) modules, environment templates, and reusable platform components to accelerate AI solution delivery.
Deploy and operationalize vector databases, embedding pipelines, orchestration frameworks, and retrieval systems to support RAG and agentic AI architectures.
Partner with Data Engineers, ML Engineers, MLOps, and Architects to deliver secure, reliable, high‑performance AI environments and production runtimes.
Implement monitoring, alerting, logging, and cost‑optimization frameworks for all AI platform services, ensuring stability and operational excellence.
Support environment provisioning, workspace configuration, cluster management, CI/CD integration, and platform‑level testing required for scalable AI deployment.
Ensure compliance with enterprise security, data governance, identity standards, and responsible AI guidelines across all AI modalities.
Requirements
5 or more years of experience in platform engineering, cloud engineering, MLOps, DevOps, or a related technical discipline.
Strong hands-on experience with AWS services such as IAM, VPC, S3, Lambda, ECS/EKS, Step Functions, CloudWatch, and networking/security best practices.
Practical experience implementing and supporting AI or ML platforms, including compute environments, containerization, and production model or LLM service deployment.
Experience with Databricks, including workspace configuration, cluster/pool setup, Unity Catalog, Delta Lake, and integration with enterprise identity and governance.
Working knowledge of Snowflake architecture, storage/compute separation, security, and integration with AI workflows.
Experience with Dataiku for automation (Scenarios), environment setup, execution engines, and project-level governance.
Proficiency with Infrastructure-as-Code, ideally Terraform, CLI-based provisioning, and Git-based workflow automation.
Strong understanding of security fundamentals—least privilege, tokenization/secrets, data access controls, network segmentation, and auditability.
Ability to collaborate with cross-functional AI teams and translate architectural guidance into robust platform implementations.
Tech Stack
AWS
Cloud
Terraform
Unity
Benefits
employer-subsidized Medical, Dental, Vision, and Life Insurance
Short-Term and Long-Term Disability
401(k) match
Flexible Spending Accounts
Health Savings Accounts
EAP
Educational Assistance
Parental Leave
Paid Time Off (for vacation, personal business, sick time, and parental leave)